{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:31:01Z","timestamp":1750221061624,"version":"3.41.0"},"reference-count":30,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2018,10,31]],"date-time":"2018-10-31T00:00:00Z","timestamp":1540944000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["J. Emerg. Technol. Comput. Syst."],"published-print":{"date-parts":[[2018,10,31]]},"abstract":"<jats:p>In recent years, deep learning has become widespread for various real-world recognition tasks. In addition to recognition accuracy, energy efficiency and speed (i.e., performance) are other grand challenges to enable local intelligence in edge devices. In this article, we investigate the adoption of monolithic three-dimensional (3D) IC (M3D) technology for deep learning hardware design, using speech recognition as a test vehicle. M3D has recently proven to be one of the leading contenders to address the power, performance, and area (PPA) scaling challenges in advanced technology nodes. Our study encompasses the influence of key parameters in DNN hardware implementations towards their performance and energy efficiency, including DNN architectural choices, underlying workloads, and tier partitioning choices in M3D designs. Our post-layout M3D designs, together with hardware-efficient sparse algorithms, produce power savings and performance improvement beyond what can be achieved using conventional 2D ICs. Experimental results show that M3D offers 22.3% iso-performance power saving and 6.2% performance improvement, convincingly demonstrating its entitlement as a solution for DNN ASICs. We further present architectural and physical design guidelines for M3D DNNs to maximize the benefits.<\/jats:p>","DOI":"10.1145\/3273956","type":"journal-article","created":{"date-parts":[[2018,11,27]],"date-time":"2018-11-27T13:18:59Z","timestamp":1543324739000},"page":"1-19","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["Power, Performance, and Area Benefit of Monolithic 3D ICs for On-Chip Deep Neural Networks Targeting Speech Recognition"],"prefix":"10.1145","volume":"14","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8513-9890","authenticated-orcid":false,"given":"Kyungwook","family":"Chang","sequence":"first","affiliation":[{"name":"Georgia Institute of Technology, Atlanta, GA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Deepak","family":"Kadetotad","sequence":"additional","affiliation":[{"name":"Arizona State University, Tempe, AZ, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yu","family":"Cao","sequence":"additional","affiliation":[{"name":"Arizona State University, Tempe, AZ, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jae-Sun","family":"Seo","sequence":"additional","affiliation":[{"name":"Arizona State University, Tempe, AZ, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sung Kyu","family":"Lim","sequence":"additional","affiliation":[{"name":"Georgia Institute of Technology, Atlanta, GA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2018,11,27]]},"reference":[{"volume-title":"Proceedings of the IEEE International Electron Devices Meeting.","author":"Batude P.","key":"e_1_2_1_1_1"},{"volume-title":"Proceedings of the International Symposium on Low Power Electronics and Design.","author":"Chang K.","key":"e_1_2_1_2_1"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/2966986.2967013"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/2897937.2898043"},{"key":"e_1_2_1_5_1","unstructured":"Y. Cheng D. Wang P. Zhou and T. Zhang. 2017. A survey of model compression and acceleration for deep neural networks. (2017). arXiv:1710.09282.  Y. Cheng D. Wang P. Zhou and T. Zhang. 2017. A survey of model compression and acceleration for deep neural networks. (2017). arXiv:1710.09282."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.327"},{"key":"e_1_2_1_7_1","doi-asserted-by":"crossref","unstructured":"A. Conneau D. Kiela H. Schwenk L. Barrault and A. Bordes. 2017. Supervised learning of universal sentence representations from natural language inference data. (2017). arXiv:1705.02364.  A. Conneau D. Kiela H. Schwenk L. Barrault and A. Bordes. 2017. Supervised learning of universal sentence representations from natural language inference data. (2017). arXiv:1705.02364.","DOI":"10.18653\/v1\/D17-1070"},{"volume-title":"Binaryconnect: Training deep neural networks with binary weights during propagations. In Advances in Neural Information Processing Systems.","year":"2015","author":"Courbariaux M.","key":"e_1_2_1_8_1"},{"key":"e_1_2_1_9_1","unstructured":"M. Courbariaux I. Hubara D. Soudry R. El-Yaniv and Y. Bengio. 2016. Binarized neural networks: Training deep neural networks with weights and activations constrained to +1 or &minus;1. (2016). arXiv:1602.02830.  M. Courbariaux I. Hubara D. Soudry R. El-Yaniv and Y. Bengio. 2016. Binarized neural networks: Training deep neural networks with weights and activations constrained to +1 or &minus;1. (2016). arXiv:1602.02830."},{"volume-title":"Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing.","author":"Deng L.","key":"e_1_2_1_10_1"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1016\/0165-1684(84)90013-6"},{"key":"e_1_2_1_12_1","doi-asserted-by":"crossref","unstructured":"J. S. Garofolo L. F. Lamel W. M. Fisher J. G. Fiscus and D. S. Pallett. 1993. DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus. NASA STI\/Recon Technical Report N.  J. S. Garofolo L. F. Lamel W. M. Fisher J. G. Fiscus and D. S. Pallett. 1993. DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus. NASA STI\/Recon Technical Report N.","DOI":"10.6028\/NIST.IR.4930"},{"volume-title":"Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing.","author":"Graves A.","key":"e_1_2_1_13_1"},{"key":"e_1_2_1_14_1","unstructured":"S. Gray A. Radford and D. Kingma. 2017. GPU Kernels for Block-Sparse Weights. Technical Report. OpenAI.  S. Gray A. Radford and D. Kingma. 2017. GPU Kernels for Block-Sparse Weights. Technical Report. OpenAI."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/3020078.3021745"},{"volume-title":"Proceedings of the International Conference on Learning Representations.","author":"Han S.","key":"e_1_2_1_16_1"},{"volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.","author":"He K.","key":"e_1_2_1_17_1"},{"volume-title":"IEEE International Conference on Acoustics, Speech and Signal Processing.","author":"He T.","key":"e_1_2_1_18_1"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/2966986.2967028"},{"volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.","author":"Karpathy A.","key":"e_1_2_1_20_1"},{"key":"e_1_2_1_21_1","unstructured":"A. Krizhevsky I. Sutskever and G. E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems.   A. Krizhevsky I. Sutskever and G. E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems."},{"volume-title":"Proceedings of the IEEE International Conference on Computer-Aided Design.","author":"Liao S.","key":"e_1_2_1_22_1"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2005.850860"},{"volume-title":"Proceedings of the IEEE SOI-3D-Subthreshold Microelectronics Technology Unified Conference.","author":"Nayak D. K.","key":"e_1_2_1_24_1"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/2627369.2627642"},{"volume-title":"Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop.","author":"Povey D.","key":"e_1_2_1_26_1"},{"volume-title":"Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing.","author":"Su D.","key":"e_1_2_1_27_1"},{"key":"e_1_2_1_28_1","unstructured":"V. Sze Y. Chen J. Emer A. Suleiman and Z. Zhang. 2016. Hardware for machine learning: Challenges and opportunities. (2016). arXiv:1612.07625.  V. Sze Y. Chen J. Emer A. Suleiman and Z. Zhang. 2016. Hardware for machine learning: Challenges and opportunities. (2016). arXiv:1612.07625."},{"key":"e_1_2_1_29_1","doi-asserted-by":"crossref","unstructured":"W. Xiong J. Droppo X. Huang F. Seide M. Seltzer A. Stolcke D. Yu and G. Zweig. 2016. The microsoft 2016 conversational speech recognition system. (2016). arXiv:1609.03528.  W. Xiong J. Droppo X. Huang F. Seide M. Seltzer A. Stolcke D. Yu and G. Zweig. 2016. The microsoft 2016 conversational speech recognition system. (2016). arXiv:1609.03528.","DOI":"10.1109\/ICASSP.2017.7953159"},{"key":"e_1_2_1_30_1","doi-asserted-by":"crossref","unstructured":"S. Yin G. Srivastava S. K. Venkataramanaiah C. Chakrabarti V. Berisha and J. Seo. 2018. Minimizing area and energy of deep learning hardware design using collective low precision and structured compression. (2018). arXiv:1804.07370.  S. Yin G. Srivastava S. K. Venkataramanaiah C. Chakrabarti V. Berisha and J. Seo. 2018. Minimizing area and energy of deep learning hardware design using collective low precision and structured compression. (2018). arXiv:1804.07370.","DOI":"10.1109\/ACSSC.2017.8335696"}],"container-title":["ACM Journal on Emerging Technologies in Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3273956","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3273956","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T00:44:44Z","timestamp":1750207484000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3273956"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,10,31]]},"references-count":30,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2018,10,31]]}},"alternative-id":["10.1145\/3273956"],"URL":"https:\/\/doi.org\/10.1145\/3273956","relation":{},"ISSN":["1550-4832","1550-4840"],"issn-type":[{"type":"print","value":"1550-4832"},{"type":"electronic","value":"1550-4840"}],"subject":[],"published":{"date-parts":[[2018,10,31]]},"assertion":[{"value":"2017-12-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2018-08-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2018-11-27","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}