{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,27]],"date-time":"2026-02-27T04:30:33Z","timestamp":1772166633346,"version":"3.50.1"},"reference-count":21,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2020,11,12]],"date-time":"2020-11-12T00:00:00Z","timestamp":1605139200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2020,11,12]],"date-time":"2020-11-12T00:00:00Z","timestamp":1605139200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Big Data"],"published-print":{"date-parts":[[2020,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>In recent years, deep learning has become one of the most important topics in computer sciences. Deep learning is a growing trend in the edge of technology and its applications are now seen in many aspects of our life such as object detection, speech recognition, natural language processing, etc. Currently, almost all major sciences and technologies are benefiting from the advantages of deep learning such as high accuracy, speed and flexibility. Therefore, any efforts in improving performance of related techniques is valuable. Deep learning accelerators are considered as hardware architecture, which are designed and optimized for increasing speed, efficiency and accuracy of computers that are running deep learning algorithms. In this paper, after reviewing some backgrounds on deep learning, a well-known accelerator architecture named MAERI (Multiply-Accumulate Engine with Reconfigurable interconnects) is investigated. Performance of a deep learning task is measured and compared in two different data flow strategies: NLR (No Local Reuse) and NVDLA (NVIDIA Deep Learning Accelerator), using an open source tool called MAESTRO (Modeling Accelerator Efficiency via Spatio-Temporal Resource Occupancy). Measured performance indicators of novel optimized architecture, NVDLA shows higher L1 and L2 computation reuse, and lower total runtime (cycles) in comparison to the other one.<\/jats:p>","DOI":"10.1186\/s40537-020-00377-8","type":"journal-article","created":{"date-parts":[[2020,11,12]],"date-time":"2020-11-12T08:03:02Z","timestamp":1605168182000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":8,"title":["Deep learning accelerators: a case study with MAESTRO"],"prefix":"10.1186","volume":"7","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0698-6141","authenticated-orcid":false,"given":"Hamidreza","family":"Bolhasani","sequence":"first","affiliation":[]},{"given":"Somayyeh Jafarali","family":"Jassbi","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2020,11,12]]},"reference":[{"key":"377_CR1","doi-asserted-by":"publisher","first-page":"85","DOI":"10.1016\/j.neunet.2014.09.003","volume":"61","author":"S Jurgen","year":"2015","unstructured":"Jurgen S. Deep learning in neural networks: an overview. Neural Netw. 2015;61:85\u2013117.","journal-title":"Neural Netw"},{"key":"377_CR2","first-page":"14","volume-title":"Neural networks: an introduction","author":"B Muller","year":"2012","unstructured":"Muller B, Reinhardt J, Strickland MT. Neural networks: an introduction. Berlin: Springer; 2012. p. 14\u20135."},{"key":"377_CR3","doi-asserted-by":"publisher","first-page":"436","DOI":"10.1038\/nature14539","volume":"521","author":"L Yann","year":"2015","unstructured":"Yann L, Yoshua B, Geoffrey H. Deep learning. Nature. 2015;521:436\u201344.","journal-title":"Nature"},{"key":"377_CR4","first-page":"3","volume":"7","author":"D Li","year":"2014","unstructured":"Li D, Dong Y. Deep learning: methods and applications. Found Trends Signal Process. 2014;7:3\u20134.","journal-title":"Found Trends Signal Process"},{"key":"377_CR5","unstructured":"Jianqing F, Cong M, Yiqiao Z. A selective overview of deep learning. arXiv:1904.05526[stat.ML]. 2019."},{"key":"377_CR6","unstructured":"Wenyan L, et al. FlexFlow: a flexible dataflow accelerator architecture for convolutional neural networks. In: IEEE \u0131nternational symposium on high performance computer architecture. 2017."},{"key":"377_CR7","unstructured":"Katrik H, et al. Morph: flexible acceleration for 3D CNN-based Video Understanding. In 51st annual IEEE\/ACM international symposium on microarchitecture (MICRO). 2018."},{"key":"377_CR8","unstructured":"Tianshi R, et al. DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. In: ACM SIGARCH Computer Architecture News; 2014."},{"key":"377_CR9","unstructured":"Zidong D, et al. ShiDianNao: Shifting Vision Processing Closer to the Sensor. In: ACM\/IEEE 42nd annual \u0131nternational symposium on computer architecture (ISCA). 2015."},{"key":"377_CR10","doi-asserted-by":"publisher","first-page":"367","DOI":"10.1145\/3007787.3001177","volume":"44","author":"Y-H Chen","year":"2016","unstructured":"Chen Y-H, Emer J, Sze V. Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks. ACM SIGARCH Computer Achitect News. 2016;44:367\u201379.","journal-title":"ACM SIGARCH Computer Achitect News"},{"key":"377_CR11","unstructured":"Michael P, et al. Buffets: an efficient and composable storage idiom for explicit decoupled data orchestration. In: ASPLOS '19: proceedings of the twenty-fourth \u0131nternational conference on architectural support for programming languages and operating systems. 2019, P 137."},{"key":"377_CR12","doi-asserted-by":"publisher","first-page":"210","DOI":"10.1016\/j.comnet.2019.02.006","volume":"161","author":"X Xia","year":"2019","unstructured":"Xia X, Marcin W, Fan X, Damasevicius R, Li Y. Multi-sink distributed power control algorithm for Cyber-physical-systems in coal mine tunnels. Comput Netw. 2019;161:210\u20139.","journal-title":"Comput Netw"},{"issue":"2","key":"377_CR13","first-page":"100","volume":"408","author":"H Song","year":"2017","unstructured":"Song H, Li W, Shen P, Vasilakos A. Gradient-driven parking navigation using a continuous information potential field based on wireless sensor network. Inf Sci. 2017;408(2):100\u201314.","journal-title":"Inf Sci"},{"key":"377_CR14","doi-asserted-by":"publisher","first-page":"64","DOI":"10.1016\/j.patcog.2019.03.009","volume":"92","author":"Z Bin","year":"2019","unstructured":"Bin Z, Dawid P, Marcin W. A regional adaptive variational PDE model for computed tomography image reconstruction. Pattern Recogn. 2019;92:64\u201381. https:\/\/doi.org\/10.1016\/j.patcog.2019.03.009.","journal-title":"Pattern Recogn."},{"key":"377_CR15","unstructured":"Hyoukjun K, Ananda S, Tushar K. MAERI: enabaling flexible dataflow mapping over DNN accelerators via reconfigurable interconnetcs. In: ASPLOS \u201918, Proceedings of the twenty-third international conference on architectural support for programming languages and operating systems."},{"key":"377_CR16","unstructured":"Uktu A, Shane O, Davor C, Andrew C. L, and Gordon R. C. An OpenCL deep learning accelerator on Arria 10. In: FPGA \u201917, Proceedings of the 2017 ACM\/SIGDA international symposium on field programmable gate arrays, pp. 55\u201364."},{"key":"377_CR17","unstructured":"Hyoukjun K, Michael P, Tushar K. MAESTRO: an open-source infrastructure for modeling dataflows within deep learning accelerators. arXiv:1805.02566. 2018."},{"key":"377_CR18","unstructured":"Karen S, Andrew Z. Very deep convolutional network for large-scale image recognition. arXiv:1409.1556. 2015"},{"key":"377_CR19","unstructured":"NVDLA Deep Learning Accelerator, https:\/\/nvdla.org. 2017."},{"key":"377_CR20","unstructured":"George SE. The anatomy and physiology of the human stress response. A clinical guide to the treatment of the human stress responses. Berlin: Springer, pp 19\u201356."},{"key":"377_CR21","unstructured":"Christian S, Alexander T, Dumitru E. Deep neural networks for object detection. Advances in neural information processing systems 26, NIPS 2013."}],"container-title":["Journal of Big Data"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s40537-020-00377-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1186\/s40537-020-00377-8\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s40537-020-00377-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2020,11,12]],"date-time":"2020-11-12T08:22:05Z","timestamp":1605169325000},"score":1,"resource":{"primary":{"URL":"https:\/\/journalofbigdata.springeropen.com\/articles\/10.1186\/s40537-020-00377-8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,11,12]]},"references-count":21,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2020,12]]}},"alternative-id":["377"],"URL":"https:\/\/doi.org\/10.1186\/s40537-020-00377-8","relation":{"has-preprint":[{"id-type":"doi","id":"10.21203\/rs.3.rs-24147\/v2","asserted-by":"object"},{"id-type":"doi","id":"10.21203\/rs.3.rs-24147\/v1","asserted-by":"object"},{"id-type":"doi","id":"10.21203\/rs.3.rs-24147\/v3","asserted-by":"object"}]},"ISSN":["2196-1115"],"issn-type":[{"value":"2196-1115","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,11,12]]},"assertion":[{"value":"13 May 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"1 November 2020","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"12 November 2020","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"Evaluating a deep learning accelerator\u2019s performance.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"100"}}