{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,13]],"date-time":"2026-02-13T23:15:20Z","timestamp":1771024520687,"version":"3.50.1"},"reference-count":53,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2023,1,24]],"date-time":"2023-01-24T00:00:00Z","timestamp":1674518400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"NSF CCF","award":["1815899"],"award-info":[{"award-number":["1815899"]}]},{"name":"NSF CSR","award":["1815780"],"award-info":[{"award-number":["1815780"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Embed. Comput. Syst."],"published-print":{"date-parts":[[2023,3,31]]},"abstract":"<jats:p>\n            As the machine learning and systems communities strive to achieve higher energy efficiency through custom deep neural network (DNN) accelerators, varied precision or quantization levels, and model compression techniques, there is a need for design space exploration frameworks that incorporate quantization-aware processing elements into the accelerator design space while having accurate and fast power, performance, and area models. In this work, we present\n            <jats:italic>QUIDAM<\/jats:italic>\n            , a highly parameterized quantization-aware DNN accelerator and model co-exploration framework. Our framework can facilitate future research on design space exploration of DNN accelerators for various design choices such as bit precision, processing element type, scratchpad sizes of processing elements, global buffer size, number of total processing elements, and DNN configurations. Our results show that different bit precisions and processing element types lead to significant differences in terms of performance per area and energy. Specifically, our framework identifies a wide range of design points where performance per area and energy varies more than 5\u00d7 and 35\u00d7, respectively. With the proposed framework, we show that lightweight processing elements achieve on par accuracy results and up to 5.7\u00d7 more performance per area and energy improvement when compared to the best 16-bit integer quantization\u2013based implementation. Finally, due to the efficiency of the pre-characterized power, performance, and area models, QUIDAM can speed up the design exploration process by three to four orders of magnitude as it removes the need for expensive synthesis and characterization of each design.\n          <\/jats:p>","DOI":"10.1145\/3555807","type":"journal-article","created":{"date-parts":[[2022,9,1]],"date-time":"2022-09-01T11:48:35Z","timestamp":1662032915000},"page":"1-21","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["QUIDAM: A Framework for\n            <u>Qu<\/u>\n            ant\n            <u>i<\/u>\n            zation-aware\n            <u>D<\/u>\n            NN\n            <u>A<\/u>\n            ccelerator and\n            <u>M<\/u>\n            odel Co-Exploration"],"prefix":"10.1145","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0971-0282","authenticated-orcid":false,"given":"Ahmet","family":"Inci","sequence":"first","affiliation":[{"name":"Carnegie Mellon University, Pittsburgh, PA, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5811-6506","authenticated-orcid":false,"given":"Siri","family":"Virupaksha","sequence":"additional","affiliation":[{"name":"Carnegie Mellon University, Pittsburgh, PA, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5734-8065","authenticated-orcid":false,"given":"Aman","family":"Jain","sequence":"additional","affiliation":[{"name":"Carnegie Mellon University, Pittsburgh, PA, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2953-0489","authenticated-orcid":false,"given":"Ting-Wu","family":"Chin","sequence":"additional","affiliation":[{"name":"Carnegie Mellon University, Pittsburgh, PA, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7894-4667","authenticated-orcid":false,"given":"Venkata","family":"Thallam","sequence":"additional","affiliation":[{"name":"Carnegie Mellon University, Pittsburgh, PA, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4311-3761","authenticated-orcid":false,"given":"Ruizhou","family":"Ding","sequence":"additional","affiliation":[{"name":"Carnegie Mellon University, Pittsburgh, PA, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5734-4221","authenticated-orcid":false,"given":"Diana","family":"Marculescu","sequence":"additional","affiliation":[{"name":"Carnegie Mellon University, Pittsburgh, PA and The University of Texas at Austin, Austin, TX, USA"}]}],"member":"320","published-online":{"date-parts":[[2023,1,24]]},"reference":[{"key":"e_1_3_1_2_2","first-page":"622","article-title":"Neuralpower: Predict and deploy energy-efficient convolutional neural networks","author":"Cai Ermao","year":"2017","unstructured":"Ermao Cai, Da-Cheng Juan, Dimitrios Stamoulis, and Diana Marculescu. 2017. Neuralpower: Predict and deploy energy-efficient convolutional neural networks. In Proceedings of the Asian Conference on Machine Learning (ACML\u201917), 622\u2013637.","journal-title":"Proceedings of the Asian Conference on Machine Learning (ACML\u201917)"},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2016.40"},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2017.54"},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSSC.2016.2616357"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00159"},{"key":"e_1_3_1_7_2","volume-title":"Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT\u201919)","author":"Devlin J.","year":"2019","unstructured":"J. Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT\u201919)."},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1145\/3270689"},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.1145\/3060403.3060465"},{"key":"e_1_3_1_10_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Esser Steven K.","year":"2020","unstructured":"Steven K. Esser, Jeffrey L. McKinstry, Deepika Bablani, Rathinakumar Appuswamy, and Dharmendra S. Modha. 2020. Learned step size quantization. In Proceedings of the International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=rkgO66VKDS."},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1145\/3093337.3037702"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1109\/DAC18074.2021.9586216"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58517-4_32"},{"key":"e_1_3_1_14_2","unstructured":"Suyog Gupta and Berkin Akin. 2020. Accelerator-aware neural network design using AutoML. arXiv preprint arXiv:2003.02838 2020. Retrieved from https:\/\/arxiv.org\/abs\/2003.02838."},{"key":"e_1_3_1_15_2","article-title":"EIE: Efficient inference engine on compressed deep neural network","author":"Han Song","year":"2016","unstructured":"Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, and William J. Dally. 2016. EIE: Efficient inference engine on compressed deep neural network. In Proceedings of the International Conference on Computer Architecture (ISCA\u201916).","journal-title":"Proceedings of the International Conference on Computer Architecture (ISCA\u201916)"},{"key":"e_1_3_1_16_2","article-title":"Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding","author":"Han Song","year":"2016","unstructured":"Song Han, Huizi Mao, and William J Dally. 2016. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. In Proceedings of the International Conference on Learning Representations (ICLR\u201916).","journal-title":"Proceedings of the International Conference on Learning Representations (ICLR\u201916)"},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.5555\/3304889.3304970"},{"key":"e_1_3_1_19_2","article-title":"The architectural implications of distributed reinforcement learning on CPU-GPU systems","author":"Inci Ahmet","year":"2020","unstructured":"Ahmet Inci, Evgeny Bolotin, Yaosheng Fu, Gal Dalal, Shie Mannor, David Nellans, and Diana Marculescu. 2020. The architectural implications of distributed reinforcement learning on CPU-GPU systems. arXiv:2012.04210. Retrieved from https:\/\/arxiv.org\/abs\/2012.04210.","journal-title":"arXiv:2012.04210"},{"key":"e_1_3_1_20_2","unstructured":"Ahmet Inci Mehmet Meric Isgenc and Diana Marculescu. 2021. Cross-layer design space exploration of NVM-based caches for deep learning In Proceedings of the 12th Non-Volatile Memories Workshop (NVMW) . Retrieved from http:\/\/nvmw.ucsd.edu\/nvmw2021-program\/nvmw2021-data\/nvmw2021-paper37-final_version_your_extended_abstract.pdf."},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2021.3127148"},{"key":"e_1_3_1_22_2","article-title":"Efficient deep learning using non-volatile memory technology","author":"Inci Ahmet","year":"2022","unstructured":"Ahmet Inci, Mehmet Meric Isgenc, and Diana Marculescu. 2022. Efficient deep learning using non-volatile memory technology. arXiv:2206.13601. Retrieved from https:\/\/arxiv.org\/abs\/2006.13601.","journal-title":"arXiv:2206.13601"},{"key":"e_1_3_1_23_2","article-title":"Solving the non-volatile memory conundrum for deep learning workloads","author":"Inci Ahmet","year":"2018","unstructured":"Ahmet Inci and Diana Marculescu. 2018. Solving the non-volatile memory conundrum for deep learning workloads. In Proceedings of the Architectures and Systems for Big Data Workshop in Conjunction with ISCA (2018).","journal-title":"Proceedings of the Architectures and Systems for Big Data Workshop in Conjunction with ISCA"},{"key":"e_1_3_1_24_2","article-title":"QADAM: Quantization-aware DNN accelerator modeling for pareto-optimality","author":"Inci Ahmet","year":"2022","unstructured":"Ahmet Inci, Siri Garudanagiri Virupaksha, Aman Jain, Venkata Vivek Thallam, Ruizhou Ding, and Diana Marculescu. 2022. QADAM: Quantization-aware DNN accelerator modeling for pareto-optimality. arXiv:2205.13045. Retrieved from https:\/\/arxiv.org\/abs\/2205.13045.","journal-title":"arXiv:2205.13045"},{"key":"e_1_3_1_25_2","article-title":"QAPPA: Quantization-aware power, performance, and area modeling of DNN accelerators","author":"Inci Ahmet","year":"2022","unstructured":"Ahmet Inci, Siri Garudanagiri Virupaksha, Aman Jain, Venkata Vivek Thallam, Ruizhou Ding, and Diana Marculescu. 2022. QAPPA: Quantization-aware power, performance, and area modeling of DNN accelerators. arXiv:2205.08648. Retrieved from https:\/\/arxiv.org\/abs\/2205.08648.","journal-title":"arXiv:2205.08648"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.23919\/DATE48585.2020.9116263"},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00286"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA52012.2021.00011"},{"key":"e_1_3_1_29_2","first-page":"1","article-title":"In-datacenter performance analysis of a tensor processing unit","author":"Jouppi N.","year":"2017","unstructured":"N. Jouppi, C. Young, Nishant Patil, David A. Patterson, Gaurav Agrawal, R. Bajwa, Sarah Bates, Suresh Bhatia, N. Boden, Al Borchers, Rick Boyle, Pierre luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, J. Dean, Ben Gelb, T. Ghaemmaghami, R. Gottipati, William Gulland, R. Hagmann, C. R. Ho, Doug Hogberg, John Hu, R. Hundt, D. Hurt, J. Ibarz, A. Jaffey, Alek Jaworski, Alexander Kaplan, Harshit Khaitan, Daniel Killebrew, Andy Koch, Naveen Kumar, Steve Lacy, J. Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle A. Lucke, Alan Lundin, G. MacKean, A. Maggiore, Maire Mahony, K. Miller, R. Nagarajan, Ravi Narayanaswami, Ray Ni, K. Nix, Thomas Norrie, Mark Omernick, Narayana Penukonda, A. Phelps, Jonathan Ross, Matt Ross, Amir Salek, E. Samadiani, C. Severn, G. Sizikov, Matthew Snelham, J. Souter, D. Steinberg, Andy Swing, Mercedes Tan, G. Thorson, Bo Tian, H. Toma, Erick Tuttle, Vijay Vasudevan, Richard Walter, Walter Wang, Eric Wilcox, and D. Yoon. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the ACM\/IEEE 44th Annual International Symposium on Computer Architecture (ISCA\u201917), 1\u201312.","journal-title":"Proceedings of the ACM\/IEEE 44th Annual International Symposium on Computer Architecture (ISCA\u201917)"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA52012.2021.00010"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1145\/3352460.3358252"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1145\/3173162.3173176"},{"key":"e_1_3_1_33_2","first-page":"367","volume-title":"Uncertainty in Artificial Intelligence","author":"Li Liam","year":"2020","unstructured":"Liam Li and Ameet Talwalkar. 2020. Random search and reproducibility for neural architecture search. In Uncertainty in Artificial Intelligence. PMLR, 367\u2013377."},{"key":"e_1_3_1_34_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Li Yuhang","year":"2020","unstructured":"Yuhang Li, Xin Dong, and Wei Wang. 2020. Additive powers-of-two quantization: An efficient non-uniform discretization for neural networks. In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1145\/3240765.3243479"},{"key":"e_1_3_1_36_2","volume-title":"Handbook of Social Psychology","author":"Mosteller F.","year":"1968","unstructured":"F. Mosteller and J. W. Tukey. 1968. Data analysis, including statistics. In Handbook of Social Psychology, G. Lindzey and E. Aronson (Eds.). Addison-Wesley, Vol. 2."},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2019.00042"},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.1145\/3140659.3080254"},{"key":"e_1_3_1_39_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Qi Hang","year":"2017","unstructured":"Hang Qi, Evan R. Sparks, and Ameet Talwalkar. 2017. Paleo: A performance model for deep neural networks. In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1109\/MC.2015.376"},{"key":"e_1_3_1_41_2","article-title":"SCALE-Sim: Systolic CNN accelerator simulator","author":"Samajdar Ananda","year":"2018","unstructured":"Ananda Samajdar, Yuhao Zhu, Paul Whatmough, Matthew Mattina, and Tushar Krishna. 2018. SCALE-Sim: Systolic CNN accelerator simulator. arXiv:1811.02883. Retrieved from https:\/\/arxiv.org\/abs\/1811.02883.","journal-title":"arXiv:1811.02883"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCAS51556.2021.9401196"},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","DOI":"10.1145\/3352460.3358302"},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2014.6853196"},{"key":"e_1_3_1_45_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Simonyan Karen","year":"2015","unstructured":"Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/MSE.2007.44"},{"key":"e_1_3_1_47_2","doi-asserted-by":"publisher","DOI":"10.5555\/3437539.3437590"},{"key":"e_1_3_1_48_2","unstructured":"Mingxing Tan and Quoc Le. 2019. EfficientNet: Rethinking model scaling for convolutional neural networks. In International Conference on Machine Learning . PMLR 6105\u20136114."},{"key":"e_1_3_1_49_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01079"},{"key":"e_1_3_1_50_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00881"},{"key":"e_1_3_1_51_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCAD45719.2019.8942149"},{"key":"e_1_3_1_52_2","first-page":"1","article-title":"Co-exploration of neural architectures and heterogeneous ASIC accelerator designs targeting multiple tasks","author":"Yang Lei","year":"2020","unstructured":"Lei Yang, Zheyu Yan, Meng Li, Hyoukjun Kwon, Liangzhen Lai, Tushar Krishna, Vikas Chandra, Weiwen Jiang, and Yiyu Shi. 2020. Co-exploration of neural architectures and heterogeneous ASIC accelerator designs targeting multiple tasks. In Proceedings of the 57th ACM\/IEEE Design Automation Conference (DAC\u201920), 1\u20136.","journal-title":"Proceedings of the 57th ACM\/IEEE Design Automation Conference (DAC\u201920)"},{"key":"e_1_3_1_53_2","article-title":"Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients","author":"Zhou Shuchang","year":"2016","unstructured":"Shuchang Zhou, Yuxin Wu, Zekun Ni, Xinyu Zhou, He Wen, and Yuheng Zou. 2016. Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv:1606.06160. Retrieved from https:\/\/arxiv.org\/abs\/1606.06160.","journal-title":"arXiv:1606.06160"},{"key":"e_1_3_1_54_2","article-title":"Rethinking co-design of neural architectures and hardware accelerators","author":"Zhou Yanqi","year":"2021","unstructured":"Yanqi Zhou, Xuanyi Dong, Berkin Akin, Mingxing Tan, Daiyi Peng, Tianjian Meng, Amir Yazdanbakhsh, Da Huang, Ravi Narayanaswami, and James Laudon. 2021. Rethinking co-design of neural architectures and hardware accelerators. arXiv:2102.08619. Retrieved from https:\/\/arxiv.org\/abs\/2102.08619.","journal-title":"arXiv:2102.08619"}],"container-title":["ACM Transactions on Embedded Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3555807","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3555807","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T17:51:36Z","timestamp":1750182696000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3555807"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,1,24]]},"references-count":53,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2023,3,31]]}},"alternative-id":["10.1145\/3555807"],"URL":"https:\/\/doi.org\/10.1145\/3555807","relation":{},"ISSN":["1539-9087","1558-3465"],"issn-type":[{"value":"1539-9087","type":"print"},{"value":"1558-3465","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,1,24]]},"assertion":[{"value":"2021-10-20","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-07-28","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-01-24","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}