{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,2]],"date-time":"2025-11-02T10:26:04Z","timestamp":1762079164821,"version":"build-2065373602"},"reference-count":83,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2023,3,1]],"date-time":"2023-03-01T00:00:00Z","timestamp":1677628800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2023,6,30]]},"abstract":"<jats:p>\n            In this article, we propose a \u201cfull-stack\u201d solution to designing high-apacity and low-latency on-chip cache hierarchies by starting at the circuit level of the hardware design stack. We propose a novel half V\n            <jats:italic>\n              <jats:sub>DD<\/jats:sub>\n            <\/jats:italic>\n            precharge 2T Gain Cell (GC) design for the cache hierarchy. The GC has several desirable characteristics, including ~50% higher storage density and ~50% lower dynamic energy as compared to the traditional 6T SRAM, even after accounting for peripheral circuit overheads. We also demonstrate data retention time of 350 us (~17.5\u00d7 of eDRAM) at 28 nm technology with V\n            <jats:italic>\n              <jats:sub>DD<\/jats:sub>\n            <\/jats:italic>\n            = 0.9V and temperature = 27\u00b0C that, combined with optimizations like staggered refresh, makes it an ideal candidate to architect all levels of on-chip caches. We show that compared to 6T SRAM, for a given area budget, GC-based caches, on average, provide 30% and 36% increase in IPC for single- and multi-programmed workloads, respectively, on contemporary workloads, including SPEC CPU 2017. We also observe dynamic energy savings of 42% and 34% for single- and multi-programmed workloads, respectively. Finally, in a quest to utilize the best of all worlds, we combine GC with STT-RAM to create hybrid hierarchies. We show that a hybrid hierarchy with GC caches at L1 and L2 and an LLC split between GC and STT-RAM is able to provide a 46% benefit in energy-delay product (EDP) as compared to an all-SRAM design, and 13% as compared to an all-GC cache hierarchy, averaged across multi-programmed workloads.\n          <\/jats:p>","DOI":"10.1145\/3572839","type":"journal-article","created":{"date-parts":[[2022,11,30]],"date-time":"2022-11-30T13:06:00Z","timestamp":1669813560000},"page":"1-20","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["HyGain: High-performance, Energy-efficient Hybrid Gain Cell-based Cache Hierarchy"],"prefix":"10.1145","volume":"20","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3032-1916","authenticated-orcid":false,"given":"Sarabjeet","family":"Singh","sequence":"first","affiliation":[{"name":"University of Utah, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7841-8541","authenticated-orcid":false,"given":"Neelam","family":"Surana","sequence":"additional","affiliation":[{"name":"NVIDIA Graphics, India"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4873-7728","authenticated-orcid":false,"given":"Kailash","family":"Prasad","sequence":"additional","affiliation":[{"name":"Department of Electrical Engineering, Indian Institute of Technology, Gandhinagar, India"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2587-2027","authenticated-orcid":false,"given":"Pranjali","family":"Jain","sequence":"additional","affiliation":[{"name":"University of California, Santa Barbara, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9646-1941","authenticated-orcid":false,"given":"Joycee","family":"Mekie","sequence":"additional","affiliation":[{"name":"Department of Electrical Engineering, Indian Institute of Technology, Gandhinagar, India"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5616-9679","authenticated-orcid":false,"given":"Manu","family":"Awasthi","sequence":"additional","affiliation":[{"name":"Ashoka University, India"}]}],"member":"320","published-online":{"date-parts":[[2023,3]]},"reference":[{"key":"e_1_3_1_2_2","unstructured":"[n. d.]. 3D V-Cache. Retrieved from https:\/\/www.amd.com\/en\/campaigns\/3d-v-cache."},{"key":"e_1_3_1_3_2","unstructured":"[n. d.]. Micron Technical Note TN-41-01. Retrieved from http:\/\/www.micron.com\/products\/support\/power-calc\/."},{"key":"e_1_3_1_4_2","unstructured":"James W. Adkisson Ramachandra Divakaruni Jeffrey P. Gambino and Jack A. Mandelman. 2002. Embedded DRAM on silicon-on-insulator substrate. US Patent 6 350 653."},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2014.6835978"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2013.6522336"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2014.6835944"},{"key":"e_1_3_1_8_2","article-title":"Prediction hybrid cache: An energy-efficient STT-RAM cache architecture","author":"Ahn Junwhan","year":"2015","unstructured":"Junwhan Ahn, Sungjoo Yoo, and Kiyoung Choi. 2015. Prediction hybrid cache: An energy-efficient STT-RAM cache architecture. IEEE Trans. Comput. 65, 3 (2015).","journal-title":"IEEE Trans. Comput."},{"key":"e_1_3_1_9_2","unstructured":"Yoshiyuki Ando. 2003. Capacitorless DRAM gain cell. US Patent 6 560 142."},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2006.45"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1145\/2830772.2830823"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1145\/1454115.1454128"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1145\/2063384.2063454"},{"key":"e_1_3_1_14_2","volume-title":"Proceedings of the 19th International Symposium on High Performance Computer Architecture (HPCA)","author":"Chang Mu-Tien","year":"2013","unstructured":"Mu-Tien Chang, Paul Rosenfeld, Shih-Lien Lu, and Bruce Jacob. 2013. Technology comparison for large last-level caches (L3Cs): Low-leakage SRAM, low write-energy STT-RAM, and refresh-optimized eDRAM. In Proceedings of the 19th International Symposium on High Performance Computer Architecture (HPCA)."},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1145\/1669112.1669164"},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1145\/2897937.2897987"},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSSC.2015.2454241"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSSC.2011.2168729"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSSC.2011.2128150"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.5555\/545215.545232"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSI.2017.2747087"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSSC.2018.2820145"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/TVLSI.2015.2394459"},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1145\/2989081.2989100"},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2014.10"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1145\/1105734.1105739"},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/NVMSA.2016.7547174"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1145\/1815961.1815971"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1145\/2508148.2485957"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1145\/2485922.2485952"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1145\/2228360.2228406"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/IEDM.2017.8268514"},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISSCC.2015.7063050"},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.1145\/379240.379268"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/TVLSI.2017.2712613"},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2014.6835954"},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2010.24"},{"key":"e_1_3_1_38_2","article-title":"Read-tuned STT-RAMM and eDRAM cache hierarchies for throughput and energy enhancement","author":"Khoshavi Navid","year":"2016","unstructured":"Navid Khoshavi, Xunchao Chen, Jun Wang, and Ronald F. DeMara. 2016. Read-tuned STT-RAMM and eDRAM cache hierarchies for throughput and energy enhancement. arXiv preprint arXiv:1607.08086 (2016).","journal-title":"arXiv preprint arXiv:1607.08086"},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1145\/3177963"},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSSC.2005.848019"},{"key":"e_1_3_1_41_2","unstructured":"Wolfgang H. Krautschneider and Werner M. Klingenstein. 1994. Process for the Manufacture of a High Density Cell Array of Gain Memory Cells. US Patent 5 308 783."},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2013.6557176"},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","DOI":"10.1145\/379240.379259"},{"key":"e_1_3_1_44_2","volume-title":"Proceedings of the IEEE 19th International Symposium on High Performance Computer Architecture (HPCA)","author":"Lee Donghyuk","year":"2013","unstructured":"Donghyuk Lee, Yoongu Kim, Vivek Seshadri, Jamie Liu, Lavanya Subramanian, and Onur Mutlu. 2013. Tiered-latency DRAM: A low latency and low cost DRAM architecture. In Proceedings of the IEEE 19th International Symposium on High Performance Computer Architecture (HPCA)."},{"key":"e_1_3_1_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/IEDM.2018.8614566"},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISPDC.2009.14"},{"key":"e_1_3_1_47_2","doi-asserted-by":"publisher","DOI":"10.1145\/1366110.1366204"},{"key":"e_1_3_1_48_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2011.5749733"},{"key":"e_1_3_1_49_2","doi-asserted-by":"publisher","DOI":"10.5555\/2337159.2337208"},{"key":"e_1_3_1_50_2","doi-asserted-by":"publisher","DOI":"10.1145\/2000064.2000074"},{"key":"e_1_3_1_51_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2015.2435788"},{"key":"e_1_3_1_52_2","article-title":"CACTI 6.0: A tool to model large caches","author":"Muralimanohar Naveen","year":"2009","unstructured":"Naveen Muralimanohar, Rajeev Balasubramonian, and Norman P. Jouppi. 2009. CACTI 6.0: A tool to model large caches. HP Laboratories 27 (2009), 28.","journal-title":"HP Laboratories"},{"key":"e_1_3_1_53_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2004.10030"},{"key":"e_1_3_1_54_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2015.2391254"},{"key":"e_1_3_1_55_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2016.7783704"},{"key":"e_1_3_1_56_2","doi-asserted-by":"publisher","DOI":"10.1145\/1250662.1250709"},{"key":"e_1_3_1_57_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2006.5"},{"key":"e_1_3_1_58_2","volume-title":"Digital Integrated Circuits","author":"Rabaey Jan M.","year":"2002","unstructured":"Jan M. Rabaey, Anantha P. Chandrakasan, and Borivoje Nikolic. 2002. Digital Integrated Circuits. Vol. 2. Prentice Hall, Englewood Cliffs, NJ."},{"key":"e_1_3_1_59_2","doi-asserted-by":"publisher","DOI":"10.1145\/1840845.1840931"},{"key":"e_1_3_1_60_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICICDT.2005.1502578"},{"key":"e_1_3_1_61_2","doi-asserted-by":"publisher","DOI":"10.1145\/2593069.2593086"},{"key":"e_1_3_1_62_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICECS.2018.8618019"},{"key":"e_1_3_1_63_2","volume-title":"Proceedings of the ACM\/SPEC International Conference on Performance Engineering","author":"Singh Sarabjeet","year":"2019","unstructured":"Sarabjeet Singh and Manu Awasthi. 2019. Memory centric characterization and analysis of SPEC CPU2017 suite. In Proceedings of the ACM\/SPEC International Conference on Performance Engineering."},{"key":"e_1_3_1_64_2","doi-asserted-by":"publisher","DOI":"10.1145\/356887.356892"},{"key":"e_1_3_1_65_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2011.5749716"},{"key":"e_1_3_1_66_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSSC.2008.2007155"},{"key":"e_1_3_1_67_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2009.4798259"},{"key":"e_1_3_1_68_2","doi-asserted-by":"publisher","DOI":"10.1145\/2155620.2155659"},{"key":"e_1_3_1_69_2","article-title":"Energy efficient single-ended 6-T SRAM for multimedia applications","author":"Surana N.","year":"2018","unstructured":"N. Surana and J. Mekie. 2018. Energy efficient single-ended 6-T SRAM for multimedia applications. IEEE Trans. Circ. Syst. II: Expr. Briefs 66, 6 (2018).","journal-title":"IEEE Trans. Circ. Syst. II: Expr. Briefs"},{"key":"e_1_3_1_70_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSII.2014.2305016"},{"key":"e_1_3_1_71_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2018.00037"},{"key":"e_1_3_1_72_2","volume-title":"Proceedings of the IEEE 19th International Symposium on High Performance Computer Architecture (HPCA)","author":"Wang Jue","year":"2013","unstructured":"Jue Wang, Xiangyu Dong, Yuan Xie, and Norman P. Jouppi. 2013. i 2 WAP: Improving non-volatile cache lifetime by reducing inter- and intra-set write variations. In Proceedings of the IEEE 19th International Symposium on High Performance Computer Architecture (HPCA)."},{"key":"e_1_3_1_73_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2014.6835933"},{"key":"e_1_3_1_74_2","doi-asserted-by":"publisher","DOI":"10.5555\/2337159.2337195"},{"key":"e_1_3_1_75_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISSCC.2015.7062930"},{"key":"e_1_3_1_76_2","volume-title":"CMOS VLSI Design: A Circuits and Systems Perspective","author":"Weste Neil H. E.","year":"2015","unstructured":"Neil H. E. Weste and David Harris. 2015. CMOS VLSI Design: A Circuits and Systems Perspective. Pearson Education India."},{"key":"e_1_3_1_77_2","doi-asserted-by":"publisher","DOI":"10.1145\/1815961.1815973"},{"key":"e_1_3_1_78_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISSCC.2012.6176968"},{"key":"e_1_3_1_79_2","doi-asserted-by":"publisher","DOI":"10.1145\/1555754.1555761"},{"key":"e_1_3_1_80_2","volume-title":"Emerging Memory Technologies: Design, Architecture, and Applications","author":"Xie Yuan","year":"2013","unstructured":"Yuan Xie. 2013. Emerging Memory Technologies: Design, Architecture, and Applications. Springer Science & Business Media."},{"key":"e_1_3_1_81_2","doi-asserted-by":"publisher","DOI":"10.1109\/TVLSI.2009.2035509"},{"key":"e_1_3_1_82_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.micpro.2014.03.007"},{"key":"e_1_3_1_83_2","doi-asserted-by":"publisher","DOI":"10.1145\/1555754.1555759"},{"key":"e_1_3_1_84_2","volume-title":"Proceedings of the International Conference on Computer-aided Design","author":"Zhou Ping","year":"2009","unstructured":"Ping Zhou, Bo Zhao, Jun Yang, and Youtao Zhang. 2009. Energy reduction for STT-RAM using early write termination. In Proceedings of the International Conference on Computer-aided Design."}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3572839","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3572839","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T17:51:08Z","timestamp":1750182668000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3572839"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,3]]},"references-count":83,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2023,6,30]]}},"alternative-id":["10.1145\/3572839"],"URL":"https:\/\/doi.org\/10.1145\/3572839","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2023,3]]},"assertion":[{"value":"2021-10-10","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-11-07","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-03-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}