{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2022,4,5]],"date-time":"2022-04-05T07:41:35Z","timestamp":1649144495572},"reference-count":26,"publisher":"Institute of Electronics, Information and Communications Engineers (IEICE)","issue":"11","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["IEICE Trans. Fundamentals"],"published-print":{"date-parts":[[2021,11,1]]},"DOI":"10.1587\/transfun.2020kep0003","type":"journal-article","created":{"date-parts":[[2021,5,31]],"date-time":"2021-05-31T22:18:39Z","timestamp":1622499519000},"page":"1488-1498","source":"Crossref","is-referenced-by-count":0,"title":["Evaluation Metrics for the Cost of Data Movement in Deep Neural Network Acceleration"],"prefix":"10.1587","volume":"E104.A","author":[{"given":"Hongjie","family":"XU","sequence":"first","affiliation":[{"name":"Kyoto University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jun","family":"SHIOMI","sequence":"additional","affiliation":[{"name":"Kyoto University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hidetoshi","family":"ONODERA","sequence":"additional","affiliation":[{"name":"Kyoto University"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"532","reference":[{"key":"1","doi-asserted-by":"crossref","unstructured":"[1] H. Xu, J. Shiomi, and H. Onodera, \u201cOn-chip memory optimized CNN accelerator with efficient partial-sum accumulation,\u201d Proc. 30th Edition on Great Lakes Symposium on VLSI, ser. GLSVLSI&apos;20, Beijing, China, 2020. 10.1145\/3386263.3406925","DOI":"10.1145\/3386263.3406925"},{"key":"2","doi-asserted-by":"publisher","unstructured":"[2] L. Yann, B. Yoshua, and H. Geoffrey, \u201cDeep Learning,\u201d Nature, vol.521, no.7553, pp.436-444, May 2015. 10.1038\/nature14539","DOI":"10.1038\/nature14539"},{"key":"3","doi-asserted-by":"crossref","unstructured":"[3] K. Vissers, \u201cVersal: The Xilinx adaptive compute acceleration platform (ACAP),\u201d Proc. 2019 ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays, ser. FPGA&apos;19, ACM, New York, NY, USA, pp.83-83, 2019. 10.1145\/3289602.3294007","DOI":"10.1145\/3289602.3294007"},{"key":"4","unstructured":"[4] Intel, Neural Compute Stick 2, 2019. [Online]. Available: https:\/\/software.intel.com\/en-us\/articles\/OpenVINO-RelNotes"},{"key":"5","unstructured":"[5] Google, Edge TPU, 2019. [Online]. Available: https:\/\/cloud.google.com\/edge-tpu\/"},{"key":"6","doi-asserted-by":"publisher","unstructured":"[6] D. Silver, A. Huang, C.J. Maddison, A. Guez, L. Sifre, G.V. Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis, \u201cMastering the game of go with deep neural networks and tree search,\u201d Nature, vol.529, no.7587, pp.484-489, Jan. 2016. 10.1038\/nature16961","DOI":"10.1038\/nature16961"},{"key":"7","unstructured":"[7] A.G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, \u201cMobileNets: Efficient convolutional neural networks for mobile vision applications,\u201d CoRR, vol.abs\/1704.04861, 2017."},{"key":"8","doi-asserted-by":"crossref","unstructured":"[8] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, \u201cGoing deeper with convolutions,\u201d 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1-9, June 2015. 10.1109\/cvpr.2015.7298594","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"9","doi-asserted-by":"publisher","unstructured":"[9] V. Sze, Y. Chen, T. Yang, and J.S. Emer, \u201cEfficient processing of deep neural networks: A tutorial and survey,\u201d Proc. IEEE, vol.105, no.12, pp.2295-2329, Dec. 2017. 10.1109\/jproc.2017.2761740","DOI":"10.1109\/JPROC.2017.2761740"},{"key":"10","doi-asserted-by":"publisher","unstructured":"[10] Y. Chen, T. Krishna, J.S. Emer, and V. Sze, \u201cEyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks,\u201d IEEE J. Solid-State Circuits, vol.52, no.1, pp.127-138, Jan 2017. 10.1109\/jssc.2016.2616357","DOI":"10.1109\/JSSC.2016.2616357"},{"key":"11","unstructured":"[11] Y.N. Wu, J.S. Emer, and V. Sze, \u201cAccelergy: An architecture-level energy estimation methodology for accelerator designs,\u201d 2019 IEEE\/ACM International Conference on Computer-Aided Design (ICCAD), pp.1-8, 2019. 10.1109\/iccad45719.2019.8942149"},{"key":"12","doi-asserted-by":"crossref","unstructured":"[12] X. Yang, M. Gao, Q. Liu, J. Setter, J. Pu, A. Nayak, S. Bell, K. Cao, H. Ha, P. Raina, C. Kozyrakis, and M. Horowitz, \u201cInterstellar: Using Halide&apos;s scheduling language to analyze DNN accelerators,\u201d Proc. Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, ser. ASPLOS&apos;20, pp.369-383, Association for Computing Machinery, New York, NY, USA, 2020. [Online]. Available: https:\/\/doi.org\/10.1145\/3373376.3378514 10.1145\/3373376.3378514","DOI":"10.1145\/3373376.3378514"},{"key":"13","doi-asserted-by":"crossref","unstructured":"[13] A. Parashar, P. Raina, Y. Shao, Y. Chen, V.A. Ying, A. Mukkara, R. Venkatesan, B. Khailany, S.W. Keckler, and J. Emer, \u201cTimeloop: A systematic approach to DNN accelerator Evaluation,\u201d 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp.304-315, IEEE Computer Society, Los Alamitos, CA, USA, March 2019. [Online]. Available: https:\/\/doi.ieeecomputersociety.org\/10.1109\/ISPASS.2019.00042","DOI":"10.1109\/ISPASS.2019.00042"},{"key":"14","doi-asserted-by":"crossref","unstructured":"[14] H. Kwon, P. Chatarasi, M. Pellauer, A. Parashar, V. Sarkar, and T. Krishna, \u201cUnderstanding reuse, performance, and hardware cost of DNN dataflow: A data-centric approach,\u201d Proc. 52nd Annual IEEE\/ACM International Symposium on Microarchitecture, ser. MICRO&apos;52, pp.754-768, Association for Computing Machinery, New York, NY, USA, 2019. [Online]. Available: https:\/\/doi.org\/10.1145\/3352460.3358252 10.1145\/3352460.3358252","DOI":"10.1145\/3352460.3358252"},{"key":"15","doi-asserted-by":"crossref","unstructured":"[15] Y. Zhao, C. Li, Y. Wang, P. Xu, Y. Zhang, and Y. Lin, \u201cDNN-chip predictor: An analytical performance predictor for DNN accelerators with various dataflows and hardware architectures,\u201d ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.1593-1597, 2020. 10.1109\/icassp40776.2020.9053977","DOI":"10.1109\/ICASSP40776.2020.9053977"},{"key":"16","doi-asserted-by":"publisher","unstructured":"[16] J. Jo, S. Kim, and I. Park, \u201cEnergy-efficient convolution architecture based on rescheduled dataflow,\u201d IEEE Trans. Circuits Syst. I, Reg. Papers, vol.65, no.12, pp.4196-4207, Dec. 2018. 10.1109\/tcsi.2018.2840092","DOI":"10.1109\/TCSI.2018.2840092"},{"key":"17","doi-asserted-by":"publisher","unstructured":"[17] J. Jo, S. Cha, D. Rho, and I. Park, \u201cDSIP: A scalable inference accelerator for convolutional neural networks,\u201d IEEE J. Solid-State Circuits, vol.53, no.2, pp.605-618, Feb. 2018. 10.1109\/jssc.2017.2764045","DOI":"10.1109\/JSSC.2017.2764045"},{"key":"18","unstructured":"[18] A. Krizhevsky, I. Sutskever, and G.E. Hinton, \u201cImageNet classification with deep convolutional neural networks,\u201d Advances in Neural Information Processing Systems 25, F. Pereira, C.J.C. Burges, L. Bottou, and K.Q. Weinberger, eds., Curran Associates, pp.1097-1105, 2012."},{"key":"19","unstructured":"[19] K. Simonyan and A. Zisserman, \u201cVery deep convolutional networks for large-scale image recognition,\u201d arXiv 1409.1556, Sept. 2014."},{"key":"20","doi-asserted-by":"crossref","unstructured":"[20] T. Yang, Y. Chen, and V. Sze, \u201cDesigning energy-efficient convolutional neural networks using energy-aware pruning,\u201d 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.6071-6079, 2017. 10.1109\/cvpr.2017.643","DOI":"10.1109\/CVPR.2017.643"},{"key":"21","doi-asserted-by":"publisher","unstructured":"[21] Y. Chen, T. Yang, J. Emer, and V. Sze, \u201cEyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices,\u201d IEEE J. Emerg. Sel. Topics in Circuits Syst., vol.9, no.2, pp.292-308, 2019. 10.1109\/jetcas.2019.2910232","DOI":"10.1109\/JETCAS.2019.2910232"},{"key":"22","doi-asserted-by":"crossref","unstructured":"[22] C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong, \u201cOptimizing FPGA-based accelerator design for deep convolutional neural networks,\u201d Proc. 2015 ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays, ser. FPGA&apos;15, Association for Computing Machinery, New York, NY, USA, pp.161-170, 2015. [Online]. Available: https:\/\/doi.org\/10.1145\/2684746.2689060 10.1145\/2684746.2689060","DOI":"10.1145\/2684746.2689060"},{"key":"23","doi-asserted-by":"publisher","unstructured":"[23] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, \u201cGradient-based learning applied to document recognition,\u201d Proc. IEEE, vol.86, no.11, pp.2278-2324, Nov 1998. 10.1109\/5.726791","DOI":"10.1109\/5.726791"},{"key":"24","doi-asserted-by":"crossref","unstructured":"[24] K. He, X. Zhang, S. Ren, and J. Sun, \u201cDeep residual learning for image recognition,\u201d 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.770-778, June 2016. 10.1109\/cvpr.2016.90","DOI":"10.1109\/CVPR.2016.90"},{"key":"25","doi-asserted-by":"publisher","unstructured":"[25] S.J.E. Wilton and N. Jouppi, \u201cCACTI: An enhanced cache access and cycle time model,\u201d J. Solid State Circuits, vol.31, no.5, pp.677-688, May 1996. 10.1109\/4.509850","DOI":"10.1109\/4.509850"},{"key":"26","doi-asserted-by":"crossref","unstructured":"[26] N.P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers, R. Boyle, P. Cantin, C. Chao, C. Clark, J. Coriell, M. Daley, M. Dau, J. Dean, B. Gelb, T.V. Ghaemmaghami, R. Gottipati, W. Gulland, R. Hagmann, C.R. Ho, D. Hogberg, J. Hu, R. Hundt, D. Hurt, J. Ibarz, A. Jaffey, A. Jaworski, A. Kaplan, H. Khaitan, A. Koch, N. Kumar, S. Lacy, J. Laudon, J. Law, D. Le, C. Leary, Z. Liu, K. Lucke, A. Lundin, G. MacKean, A. Maggiore, M. Mahony, K. Miller, R. Nagarajan, R. Narayanaswami, R. Ni, K. Nix, T. Norrie, M. Omernick, N. Penukonda, A. Phelps, J. Ross, M. Ross, A. Salek, E. Samadiani, C. Severn, G. Sizikov, M. Snelham, J. Souter, D. Steinberg, A. Swing, M. Tan, G. Thorson, B. Tian, H. Toma, E. Tuttle, V. Vasudevan, R. Walter, W. Wang, E. Wilcox, and D.H. Yoon, \u201cIn-datacenter Performance Analysis of A Tensor Processing Unit,\u201d 2017 ACM\/IEEE 44th Annual International Symposium on Computer Architecture (ISCA), pp.1-12, June 2017. 10.1145\/3079856.3080246","DOI":"10.1145\/3079856.3080246"}],"container-title":["IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.jstage.jst.go.jp\/article\/transfun\/E104.A\/11\/E104.A_2020KEP0003\/_pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,11,6]],"date-time":"2021-11-06T03:19:14Z","timestamp":1636168754000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.jstage.jst.go.jp\/article\/transfun\/E104.A\/11\/E104.A_2020KEP0003\/_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,11,1]]},"references-count":26,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2021]]}},"URL":"https:\/\/doi.org\/10.1587\/transfun.2020kep0003","relation":{},"ISSN":["0916-8508","1745-1337"],"issn-type":[{"value":"0916-8508","type":"print"},{"value":"1745-1337","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,11,1]]},"article-number":"2020KEP0003"}}