{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:40:03Z","timestamp":1750192803103,"version":"3.41.0"},"reference-count":27,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2021,5,10]],"date-time":"2021-05-10T00:00:00Z","timestamp":1620604800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000001","name":"US National Science Foundation","doi-asserted-by":"crossref","award":["DUE-1259462, IIA-1358147, CCF-1533828, CCF-1533846, DGE-1565215, DRL-1640039, CRI-1822737, CCF-1823398, CCF-1823417, CCF-1900788, and CCF-1901005"],"award-info":[{"award-number":["DUE-1259462, IIA-1358147, CCF-1533828, CCF-1533846, DGE-1565215, DRL-1640039, CRI-1822737, CCF-1823398, CCF-1823417, CCF-1900788, and CCF-1901005"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2021,9,30]]},"abstract":"<jats:p>While data filter caches (DFCs) have been shown to be effective at reducing data access energy, they have not been adopted in processors due to the associated performance penalty caused by high DFC miss rates. In this article, we present a design that both decreases the DFC miss rate and completely eliminates the DFC performance penalty even for a level-one data cache (L1\u00a0DC) with a single cycle access time. First, we show that a DFC that lazily fills each word in a DFC line from an L1\u00a0DC only when the word is referenced is more energy-efficient than eagerly filling the entire DFC line. For a 512B DFC, we are able to eliminate loads of words into the DFC that are never referenced before being evicted, which occurred for about 75% of the words in 32B lines. Second, we demonstrate that a lazily word filled DFC line can effectively share and pack data words from multiple L1\u00a0DC lines to lower the DFC miss rate. For a 512B DFC, we completely avoid accessing the L1\u00a0DC for loads about 23% of the time and avoid a fully associative L1\u00a0DC access for loads 50% of the time, where the DFC only requires about 2.5% of the size of the L1\u00a0DC. Finally, we present a method that completely eliminates the DFC performance penalty by speculatively performing DFC tag checks early and only accessing DFC data when a hit is guaranteed. For a 512B DFC, we improve data access energy usage for the DTLB and L1\u00a0DC by 33% with no performance degradation.<\/jats:p>","DOI":"10.1145\/3449043","type":"journal-article","created":{"date-parts":[[2021,5,10]],"date-time":"2021-05-10T22:15:17Z","timestamp":1620684917000},"page":"1-22","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Decreasing the Miss Rate and Eliminating the Performance Penalty of a Data Filter Cache"],"prefix":"10.1145","volume":"18","author":[{"given":"Michael","family":"Stokes","sequence":"first","affiliation":[{"name":"Florida State University, Tallahassee, Florida"}]},{"given":"David","family":"Whalley","sequence":"additional","affiliation":[{"name":"Florida State University, Tallahassee, Florida"}]},{"given":"Soner","family":"Onder","sequence":"additional","affiliation":[{"name":"Michigan Technological University, Houghton, Michigan"}]}],"member":"320","published-online":{"date-parts":[[2021,5,10]]},"reference":[{"key":"e_1_2_1_1_1","first-page":"4","article-title":"Designing a practical data filter cache to improve both energy efficiency and performance","volume":"10","author":"Bardizbanyan A.","year":"2013","unstructured":"A. Bardizbanyan , M. Sj\u00e4lander , D. Whalley , and P. Larsson-Edefors . 2013 . Designing a practical data filter cache to improve both energy efficiency and performance . ACM Trans. Archit. Compiler Optim. 10 , 4 (Dec. 2013), 54:1--54:25. A. Bardizbanyan, M. Sj\u00e4lander, D. Whalley, and P. Larsson-Edefors. 2013. Designing a practical data filter cache to improve both energy efficiency and performance. ACM Trans. Archit. Compiler Optim. 10, 4 (Dec. 2013), 54:1--54:25.","journal-title":"ACM Trans. Archit. Compiler Optim."},{"volume-title":"Proceedings of the IEEE International Conference on Computer Design (ICCD\u201913)","author":"Bardizbanyan A.","key":"e_1_2_1_2_1","unstructured":"A. Bardizbanyan , M. Sj\u00e4lander , D. Whalley , and P. Larsson-Edefors . 2013. Speculative tag access for reduced energy dissipation in set-associative L1 data caches . In Proceedings of the IEEE International Conference on Computer Design (ICCD\u201913) . 302--308. A. Bardizbanyan, M. Sj\u00e4lander, D. Whalley, and P. Larsson-Edefors. 2013. Speculative tag access for reduced energy dissipation in set-associative L1 data caches. In Proceedings of the IEEE International Conference on Computer Design (ICCD\u201913). 302--308."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/MC.2008.224"},{"volume-title":"Proceedings of the IEEE International Conference on Compilers, Architecture, and Synthesis for Embedded Systems. ACM","author":"Duong N.","key":"e_1_2_1_4_1","unstructured":"N. Duong , T. Kim , D. Zhao , and A. Veidenbaum . 2012. Revisiting level-0 caches in embedded processors . In Proceedings of the IEEE International Conference on Compilers, Architecture, and Synthesis for Embedded Systems. ACM , New York, NY, 171--180. N. Duong, T. Kim, D. Zhao, and A. Veidenbaum. 2012. Revisiting level-0 caches in embedded processors. In Proceedings of the IEEE International Conference on Compilers, Architecture, and Synthesis for Embedded Systems. ACM, New York, NY, 171--180."},{"volume-title":"Proceedings of the ACM SIGPLAN Conference on Languages, Compilers, and Tools for Embedded Systems. 119--128","author":"Hines S.","key":"e_1_2_1_5_1","unstructured":"S. Hines , P. Gavin , Y. Peress , D. Whalley , and G. Tyson . 2009. Guaranteeing instruction fetch behavior with a lookahead instruction fetch engine (LIFE) . In Proceedings of the ACM SIGPLAN Conference on Languages, Compilers, and Tools for Embedded Systems. 119--128 . S. Hines, P. Gavin, Y. Peress, D. Whalley, and G. Tyson. 2009. Guaranteeing instruction fetch behavior with a lookahead instruction fetch engine (LIFE). In Proceedings of the ACM SIGPLAN Conference on Languages, Compilers, and Tools for Embedded Systems. 119--128."},{"volume-title":"Proceedings of the ACM SIGMICRO International Symposium on Microarchitecture. 433--444","author":"Hines S.","key":"e_1_2_1_6_1","unstructured":"S. Hines , D. Whalley , and G. Tyson . 2007. Guaranteeing hits to improve the efficiency of a small instruction cache . In Proceedings of the ACM SIGMICRO International Symposium on Microarchitecture. 433--444 . S. Hines, D. Whalley, and G. Tyson. 2007. Guaranteeing hits to improve the efficiency of a small instruction cache. In Proceedings of the ACM SIGMICRO International Symposium on Microarchitecture. 433--444."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/313817.313948"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/PACT.2009.29"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/1878921.1878955"},{"volume-title":"Proceedings of the IEEE International Symposium on Microarchitecture. 184--193","author":"Kin J.","key":"e_1_2_1_10_1","unstructured":"J. Kin , M. Gupta , and W. H. Mangione-Smith . 1997. The filter cache: An energy efficient memory structure . In Proceedings of the IEEE International Symposium on Microarchitecture. 184--193 . J. Kin, M. Gupta, and W. H. Mangione-Smith. 1997. The filter cache: An energy efficient memory structure. In Proceedings of the IEEE International Symposium on Microarchitecture. 184--193."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/12.822560"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2014.2349503"},{"key":"e_1_2_1_13_1","volume-title":"Richard D. Strong, Jay B. Brockman, Dean M. Tullsen, and Norman P. Jouppi.","author":"Li Sheng","year":"2009","unstructured":"Sheng Li , Jung Ho Ahn , Richard D. Strong, Jay B. Brockman, Dean M. Tullsen, and Norman P. Jouppi. 2009 . McPAT : An integrated power, area, and timing modeling framework for multicore and manycore architectures. 469--480. DOI:https:\/\/doi.org\/10.1145\/1669112.1669172 Sheng Li, Jung Ho Ahn, Richard D. Strong, Jay B. Brockman, Dean M. Tullsen, and Norman P. Jouppi. 2009. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures. 469--480. DOI:https:\/\/doi.org\/10.1145\/1669112.1669172"},{"key":"e_1_2_1_14_1","volume-title":"Jay B. Brockman, and Norman P. Jouppi.","author":"Li Sheng","year":"2011","unstructured":"Sheng Li , Ke Chen , Jung Ho Ahn , Jay B. Brockman, and Norman P. Jouppi. 2011 . CACTI-P: Architecture-level modeling for SRAM-based structures with advanced leakage reduction techniques. 694--701. DOI:https:\/\/doi.org\/10.1109\/ICCAD.2011.6105405 Sheng Li, Ke Chen, Jung Ho Ahn, Jay B. Brockman, and Norman P. Jouppi. 2011. CACTI-P: Architecture-level modeling for SRAM-based structures with advanced leakage reduction techniques. 694--701. DOI:https:\/\/doi.org\/10.1109\/ICCAD.2011.6105405"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCL.1998.674159"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2001.991105"},{"key":"e_1_2_1_17_1","volume-title":"Proceedings of the International Conference on Computer Design. 327--333","author":"Pujara Prateek","year":"2005","unstructured":"Prateek Pujara and Aneesh. 2005 . Restrictive compression techniques to increase level 1 cache capacity . In Proceedings of the International Conference on Computer Design. 327--333 . Prateek Pujara and Aneesh. 2005. Restrictive compression techniques to increase level 1 cache capacity. In Proceedings of the International Conference on Computer Design. 327--333."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.1994.288133"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISLPED.2019.8824951"},{"volume-title":"Proceedings of the International Conference on Computer Design. IEEE Computer Society","author":"Tang W.","key":"e_1_2_1_20_1","unstructured":"W. Tang , R. Gupta , and A. Nicolau . 2001. Design of a predictive filter cache for energy savings in high performance processor architectures . In Proceedings of the International Conference on Computer Design. IEEE Computer Society , Washington, DC, 68--73. W. Tang, R. Gupta, and A. Nicolau. 2001. Design of a predictive filter cache for energy savings in high performance processor architectures. In Proceedings of the International Conference on Computer Design. IEEE Computer Society, Washington, DC, 68--73."},{"key":"e_1_2_1_21_1","first-page":"4","article-title":"Decoupled fused cache: Fusing a decoupled LLC with a DRAM cache","volume":"15","author":"Vasilakis E.","year":"2019","unstructured":"E. Vasilakis , V. Papaefstathiou , P. Trancoso , and I. Sourdis . 2019 . Decoupled fused cache: Fusing a decoupled LLC with a DRAM cache . ACM Trans. Archit. Compiler Optim. 15 , 4 (Jan. 2019), 23. E. Vasilakis, V. Papaefstathiou, P. Trancoso, and I. Sourdis. 2019. Decoupled fused cache: Fusing a decoupled LLC with a DRAM cache. ACM Trans. Archit. Compiler Optim. 15, 4 (Jan. 2019), 23.","journal-title":"ACM Trans. Archit. Compiler Optim."},{"volume-title":"Proceedings of the IEEE\/ACM International Symposium on Microarchitecture. 214--220","author":"Villa L.","key":"e_1_2_1_22_1","unstructured":"L. Villa , M. Zhang , and K. Asanovic . 2000. Dynamic zero compression for cache energy reduction . In Proceedings of the IEEE\/ACM International Symposium on Microarchitecture. 214--220 . L. Villa, M. Zhang, and K. Asanovic. 2000. Dynamic zero compression for cache energy reduction. In Proceedings of the IEEE\/ACM International Symposium on Microarchitecture. 214--220."},{"key":"e_1_2_1_23_1","volume-title":"Proceedings of the ACM\/IEEE International Symposium on Microarchitecture. 197--207","author":"Yang Jun","year":"2002","unstructured":"Jun Yang and Rajiv Gupta . 2002 . Energy efficient frequent value data cache design . In Proceedings of the ACM\/IEEE International Symposium on Microarchitecture. 197--207 . Jun Yang and Rajiv Gupta. 2002. Energy efficient frequent value data cache design. In Proceedings of the ACM\/IEEE International Symposium on Microarchitecture. 197--207."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/40.621212"},{"key":"e_1_2_1_25_1","volume-title":"Proceedings of the ACM\/IEEE International Symposium on Microarchitecture. 258--265","author":"Zhang Youtao","year":"2000","unstructured":"Youtao Zhang , Jun Yang , and Rajiv Gupta . 2000 . Frequent value compression in data caches . In Proceedings of the ACM\/IEEE International Symposium on Microarchitecture. 258--265 . Youtao Zhang, Jun Yang, and Rajiv Gupta. 2000. Frequent value compression in data caches. In Proceedings of the ACM\/IEEE International Symposium on Microarchitecture. 258--265."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/378993.379235"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2002.997880"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3449043","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3449043","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:17:45Z","timestamp":1750191465000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3449043"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,5,10]]},"references-count":27,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2021,9,30]]}},"alternative-id":["10.1145\/3449043"],"URL":"https:\/\/doi.org\/10.1145\/3449043","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2021,5,10]]},"assertion":[{"value":"2020-07-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-02-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-05-10","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}