{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,13]],"date-time":"2026-05-13T17:21:45Z","timestamp":1778692905154,"version":"3.51.4"},"reference-count":83,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2023,12,18]],"date-time":"2023-12-18T00:00:00Z","timestamp":1702857600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000028","name":"Semiconductor Research Corporation","doi-asserted-by":"crossref","id":[{"id":"10.13039\/100000028","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Des. Autom. Electron. Syst."],"published-print":{"date-parts":[[2024,1,31]]},"abstract":"<jats:p>Deep neural network (DNN) implementations are typically characterized by huge datasets and concurrent computation, resulting in a demand for high memory bandwidth due to intensive data movement between processors and off-chip memory. Performing DNN inference on general-purpose cores\/edge is gaining attraction to enhance user experience and reduce latency. The mismatch in the CPU and conventional DRAM speed leads to under-utilization of the compute capabilities, causing increased inference time. 3D DRAM is a promising solution to effectively fulfill the bandwidth requirement of high-throughput DNNs. However, due to high power density in stacked architectures, 3D DRAMs need dynamic thermal management (DTM), resulting in performance overhead due to memory-induced CPU throttling.<\/jats:p>\n          <jats:p>\n            We study the thermal impact of DNN applications running on a 3D DRAM system, and make a case for a memory temperature-aware customized prefetch mechanism to reduce DTM overheads and significantly improve performance. In our proposed\n            <jats:italic>NeuroCool<\/jats:italic>\n            DTM policy, we intelligently place either DRAM ranks or tiers in low power state, using the DNN layer characteristics and access rate. We establish the generalization of our approach through training and test datasets comprising diverse data points from widely used DNN applications. Experimental results on popular DNNs show that NeuroCool results in a average performance gain of 44% (as high as 52%) and memory energy improvement of 43% (as high as 69%) over general-purpose DTM policies.\n          <\/jats:p>\n          <jats:p\/>","DOI":"10.1145\/3630012","type":"journal-article","created":{"date-parts":[[2023,10,23]],"date-time":"2023-10-23T21:18:18Z","timestamp":1698095898000},"page":"1-35","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":11,"title":["NeuroCool: Dynamic Thermal Management of 3D DRAM for Deep Neural Networks through Customized Prefetching"],"prefix":"10.1145","volume":"29","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3013-5128","authenticated-orcid":false,"given":"Shailja","family":"Pandey","sequence":"first","affiliation":[{"name":"Department of CSE, Indian Institute of Technology Delhi, India"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5312-8679","authenticated-orcid":false,"given":"Lokesh","family":"Siddhu","sequence":"additional","affiliation":[{"name":"Department of CSE, Indian Institute of Technology Delhi, India"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2508-7531","authenticated-orcid":false,"given":"Preeti Ranjan","family":"Panda","sequence":"additional","affiliation":[{"name":"Department of CSE, Indian Institute of Technology Delhi, India"}]}],"member":"320","published-online":{"date-parts":[[2023,12,18]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.1109\/TVLSI.2015.2420315"},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41586-022-04448-z"},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1145\/2534381"},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1145\/1629911.1630039"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1145\/3195970.3196128"},{"key":"e_1_3_1_7_2","first-page":"670","volume-title":"Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS\u201918)","author":"Beigi Majed Valad","year":"2018","unstructured":"Majed Valad Beigi and Gokhan Memik. 2018. THOR: Thermal-aware optimizations for extending ReRAM lifetime. In Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS\u201918). IEEE, Los Alamitos, CA, 670\u2013679."},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1145\/3466752.3480114"},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.1145\/2968455.2974013"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1145\/2629677"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2018.2855122"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1109\/DATE.2012.6176428"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSSC.2016.2616357"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1145\/3357526.3357569"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1145\/2390191.2390198"},{"key":"e_1_3_1_16_2","unstructured":"Intel Corporation. 2020. Intel \u00ae Architecture Instruction Set Extensions and Future Features: Programming Reference . Intel Corporation."},{"key":"e_1_3_1_17_2","volume-title":"Proceedings of the 2017 Conference on Neural Information Processing Systems","author":"Dean Jeff","year":"2017","unstructured":"Jeff Dean. 2017. Machine learning for systems and systems for machine learning. In Proceedings of the 2017 Conference on Neural Information Processing Systems."},{"key":"e_1_3_1_18_2","article-title":"BERT: Pre-training of deep bidirectional transformers for language understanding","author":"Devlin Jacob","year":"2018","unstructured":"Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).","journal-title":"arXiv preprint arXiv:1810.04805"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298878"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2018.00040"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1145\/3286475.3286484"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1145\/144965.145006"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.23919\/DATE48585.2020.9116511"},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/EUVIP.2018.8611783"},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/16.678551"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/DATE.2011.5763053"},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2018.00059"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00065"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.23919\/DATE.2019.8715001"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2021.3127148"},{"key":"e_1_3_1_31_2","unstructured":"Intel. 2022. Intel Max Series Brings Breakthrough Memory Bandwidth and Performance to HPC and AI. Retrieved October 31 2023 from https:\/\/www.intel.com\/content\/www\/us\/en\/newsroom\/news\/introducing-intel-max-series-product-family.html"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.23919\/DATE48585.2020.9116510"},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/VLSIT.2012.6242474"},{"key":"e_1_3_1_34_2","unstructured":"JEDEC. 2022. JEDEC Standard High Bandwidth Memory DRAM (HBM3) JESD238. Retrieved October 31 2023 from https:\/\/www.jedec.org\/standards-documents\/docs\/jesd238"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1145\/2588889"},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/IMW.2017.7939084"},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1145\/3508027"},{"key":"e_1_3_1_38_2","article-title":"Full stack optimization of transformer inference: A survey","author":"Kim Sehoon","year":"2023","unstructured":"Sehoon Kim, Coleman Hooper, Thanakul Wattanawong, Minwoo Kang, Ruohan Yan, Hasan Genc, Grace Dinh, Qijing Huang, Kurt Keutzer, Michael W. Mahoney, et\u00a0al. 2023. Full stack optimization of transformer inference: A survey. arXiv preprint arXiv:2302.14017 (2023).","journal-title":"arXiv preprint arXiv:2302.14017"},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1145\/3065386"},{"key":"e_1_3_1_40_2","article-title":"CMSIS-NN: Efficient neural network kernels for ARM Cortex-M CPUs","author":"Lai Liangzhen","year":"2018","unstructured":"Liangzhen Lai, Naveen Suda, and Vikas Chandra. 2018. CMSIS-NN: Efficient neural network kernels for ARM Cortex-M CPUs. arXiv preprint arXiv:1801.06601 (2018).","journal-title":"arXiv preprint arXiv:1801.06601"},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.1145\/3223046"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1145\/2133382.2133384"},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2016.53"},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.23919\/DATE.2018.8342033"},{"key":"e_1_3_1_45_2","doi-asserted-by":"publisher","DOI":"10.7873\/DATE.2015.0724"},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.1145\/1654059.1654116"},{"key":"e_1_3_1_47_2","doi-asserted-by":"publisher","DOI":"10.1109\/MDT.2005.134"},{"key":"e_1_3_1_48_2","doi-asserted-by":"publisher","DOI":"10.1145\/2508148.2485928"},{"key":"e_1_3_1_49_2","doi-asserted-by":"publisher","DOI":"10.1145\/2366231.2337161"},{"key":"e_1_3_1_50_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISLPED.2019.8824926"},{"key":"e_1_3_1_51_2","first-page":"1025","volume-title":"Proceedings of the USENIX Annual Technical Conference (USENIX ATC\u201919)","author":"Liu Yizhi","year":"2019","unstructured":"Yizhi Liu, Yao Wang, Ruofei Yu, Mu Li, Vin Sharma, and Yida Wang. 2019. Optimizing CNN model inference on CPUs. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC\u201919). 1025\u20131040."},{"key":"e_1_3_1_52_2","doi-asserted-by":"publisher","DOI":"10.5555\/2971808.2972061"},{"key":"e_1_3_1_53_2","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2015.2409847"},{"key":"e_1_3_1_54_2","doi-asserted-by":"publisher","DOI":"10.1145\/977091.977115"},{"key":"e_1_3_1_55_2","doi-asserted-by":"publisher","DOI":"10.1145\/3284357"},{"key":"e_1_3_1_56_2","doi-asserted-by":"publisher","DOI":"10.1145\/3310273.3323435"},{"key":"e_1_3_1_57_2","doi-asserted-by":"publisher","DOI":"10.1145\/2228360.2228477"},{"key":"e_1_3_1_58_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2020.2982392"},{"key":"e_1_3_1_59_2","doi-asserted-by":"publisher","DOI":"10.1145\/3343030"},{"key":"e_1_3_1_60_2","unstructured":"NXP. 2018. NXP Semiconductors Layerscape\u00ae LX2160A LX2120A LX2080A Processors. Retrieved October 31 2023 from https:\/\/www.nxp.com\/"},{"key":"e_1_3_1_61_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA45697.2020.00021"},{"key":"e_1_3_1_62_2","article-title":"Deep learning inference in Facebook data centers: Characterization, performance optimizations and hardware implications","author":"Park Jongsoo","year":"2018","unstructured":"Jongsoo Park, Maxim Naumov, Protonu Basu, Summer Deng, Aravind Kalaiah, Daya Khudia, James Law, Parth Malani, Andrey Malevich, Satish Nadathur, et\u00a0al. 2018. Deep learning inference in Facebook data centers: Characterization, performance optimizations and hardware implications. arXiv preprint arXiv:1811.09886 (2018).","journal-title":"arXiv preprint arXiv:1811.09886"},{"key":"e_1_3_1_63_2","doi-asserted-by":"publisher","DOI":"10.1109\/IISWC55918.2022.00033"},{"key":"e_1_3_1_64_2","unstructured":"Runjie Zhang Mircea R. Stan and Kevin Skadron. 2015. HotSpot 6.0: Validation Acceleration and Extension . Technical Report CS-2015-04. University of Virginia."},{"key":"e_1_3_1_65_2","unstructured":"Alec Radford Jeff Wu Rewon Child David Luan Dario Amodei and Ilya Sutskever. 2019. Language models are unsupervised multitask learners. Preprint."},{"key":"e_1_3_1_66_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41928-020-00515-3"},{"key":"e_1_3_1_67_2","doi-asserted-by":"publisher","DOI":"10.1016\/0377-0427(87)90125-7"},{"key":"e_1_3_1_68_2","doi-asserted-by":"publisher","DOI":"10.1109\/DAC18072.2020.9218505"},{"key":"e_1_3_1_69_2","doi-asserted-by":"publisher","DOI":"10.1145\/3419468"},{"key":"e_1_3_1_70_2","doi-asserted-by":"publisher","DOI":"10.1145\/3532185"},{"key":"e_1_3_1_71_2","doi-asserted-by":"publisher","DOI":"10.1145\/3358208"},{"key":"e_1_3_1_72_2","article-title":"Very deep convolutional networks for large-scale image recognition","author":"Simonyan Karen","year":"2014","unstructured":"Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).","journal-title":"arXiv preprint arXiv:1409.1556"},{"key":"e_1_3_1_73_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSSC.2016.2602221"},{"key":"e_1_3_1_74_2","unstructured":"Vincent Vanhoucke Andrew Senior and Mark Z. Mao. 2011. Improving the speed of neural networks on CPUs. In Proceedings of the Deep Learning and Unsupervised Feature Learning Workshop (NIPS\u201911) ."},{"key":"e_1_3_1_75_2","article-title":"Attention is all you need","volume":"30","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in Neural Information Processing Systems 30 (2017), 1\u201311.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_76_2","doi-asserted-by":"publisher","DOI":"10.1109\/MSSC.2017.2745818"},{"key":"e_1_3_1_77_2","first-page":"20929","article-title":"Explanation-based data augmentation for image classification","volume":"34","author":"Wickramanayake Sandareka","year":"2021","unstructured":"Sandareka Wickramanayake, Wynne Hsu, and Mong Li Lee. 2021. Explanation-based data augmentation for image classification. Advances in Neural Information Processing Systems 34 (2021), 20929\u201320940.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_78_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2019.00048"},{"key":"e_1_3_1_79_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2011.5762710"},{"key":"e_1_3_1_80_2","doi-asserted-by":"publisher","DOI":"10.1145\/216585.216588"},{"key":"e_1_3_1_81_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2018.2858230"},{"key":"e_1_3_1_82_2","doi-asserted-by":"publisher","DOI":"10.1145\/2684746.2689060"},{"key":"e_1_3_1_83_2","doi-asserted-by":"publisher","DOI":"10.1109\/NAS.2014.46"},{"key":"e_1_3_1_84_2","doi-asserted-by":"publisher","DOI":"10.1145\/3316781.3317923"}],"container-title":["ACM Transactions on Design Automation of Electronic Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3630012","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3630012","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T23:57:01Z","timestamp":1750291021000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3630012"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,12,18]]},"references-count":83,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2024,1,31]]}},"alternative-id":["10.1145\/3630012"],"URL":"https:\/\/doi.org\/10.1145\/3630012","relation":{},"ISSN":["1084-4309","1557-7309"],"issn-type":[{"value":"1084-4309","type":"print"},{"value":"1557-7309","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,12,18]]},"assertion":[{"value":"2022-09-03","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-10-08","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-12-18","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}