{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,12]],"date-time":"2026-06-12T12:52:34Z","timestamp":1781268754150,"version":"3.54.1"},"reference-count":30,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2021,9,3]],"date-time":"2021-09-03T00:00:00Z","timestamp":1630627200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2021,12,31]]},"abstract":"<jats:p>Cache-block compression is a highly effective technique for both reducing accesses to lower levels in the memory hierarchy (cache compression) and minimizing data transfers (link compression). While many effective cache-block compression algorithms have been proposed, the design of these algorithms is largely ad hoc and manual and relies on human recognition of patterns. In this article, we take an entirely different approach. We introduce a class of \u201cbyte-select\u201d compression algorithms, as well as an automated methodology for generating compression algorithms in this class. We argue that, based on upper bounds within the class, the study of this class of byte-select algorithms has potential to yield algorithms with better performance than existing cache-block compression algorithms. The upper bound we establish on the compression ratio is 2X that of any existing algorithm. We then offer a generalized representation of a subset of byte-select compression algorithms and search through the resulting space guided by a set of training data traces. Using this automated process, we find efficient and effective algorithms for various hardware applications. We find that the resulting algorithms exploit novel patterns that can inform future algorithm designs. The generated byte-select algorithms are evaluated against a separate set of traces and evaluations show that Byte-Select has a 23% higher compression ratio on average. While no previous algorithm performs best for all our data sets which include CPU and GPU applications, our generated algorithms do. Using an automated hardware generator for these algorithms, we show that their decompression and compression latency is one and two cycles respectively, much lower than any existing algorithm with a competitive compression ratio.<\/jats:p>","DOI":"10.1145\/3462209","type":"journal-article","created":{"date-parts":[[2021,9,3]],"date-time":"2021-09-03T16:12:01Z","timestamp":1630685521000},"page":"1-27","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Byte-Select Compression"],"prefix":"10.1145","volume":"18","author":[{"given":"Matthew","family":"Tomei","sequence":"first","affiliation":[{"name":"University of Illinois at Urbana-Champaign"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Shomit","family":"Das","sequence":"additional","affiliation":[{"name":"Advanced Micro Devices, Inc."}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Mohammad","family":"Seyedzadeh","sequence":"additional","affiliation":[{"name":"Advanced Micro Devices, Inc."}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Philip","family":"Bedoukian","sequence":"additional","affiliation":[{"name":"Cornell University"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Bradford","family":"Beckmann","sequence":"additional","affiliation":[{"name":"Advanced Micro Devices, Inc."}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Rakesh","family":"Kumar","sequence":"additional","affiliation":[{"name":"University of Illinois at Urbana-Champaign"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"David","family":"Wood","sequence":"additional","affiliation":[{"name":"Advanced Micro Devices, Inc., University of Wisconsin-Madison"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2021,9,3]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/2678373.2665696"},{"key":"e_1_2_1_2_1","volume-title":"Computer Architecture, 2004. Proceedings. 31st Annual International Symposium on (2004","author":"Alameldeen A. R.","year":"2017","unstructured":"A. R. Alameldeen and D. A. Wood . 2004. Adaptive cache compression for high-performance processors . In Computer Architecture, 2004. Proceedings. 31st Annual International Symposium on (2004 ), 212\u2013223, Accessed : Jul. 02, 2017 . [Online]. Available: http:\/\/ieeexplore.ieee.org\/abstract\/document\/1310776\/. A. R. Alameldeen and D. A. Wood. 2004. Adaptive cache compression for high-performance processors. In Computer Architecture, 2004. Proceedings. 31st Annual International Symposium on (2004), 212\u2013223, Accessed: Jul. 02, 2017. [Online]. Available: http:\/\/ieeexplore.ieee.org\/abstract\/document\/1310776\/."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/TVLSI.2009.2020989"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/2370816.2370870"},{"key":"e_1_2_1_5_1","volume-title":"2013 46th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO) (2013)","author":"Sardashti S.","unstructured":"S. Sardashti and D. A. Wood . 2013. Decoupled compressed cache: Exploiting spatial locality for energy-optimized compressed caching . In 2013 46th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO) (2013) 62\u201373. S. Sardashti and D. A. Wood. 2013. Decoupled compressed cache: Exploiting spatial locality for energy-optimized compressed caching. In 2013 46th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO) (2013) 62\u201373."},{"key":"e_1_2_1_6_1","volume-title":"Microarchitecture (MICRO), 2016 49th Annual IEEE\/ACM International Symposium on (2016","author":"Panda B.","year":"2017","unstructured":"B. Panda and A. Seznec . 2016. Dictionary sharing: An efficient cache compression scheme for compressed caches . In Microarchitecture (MICRO), 2016 49th Annual IEEE\/ACM International Symposium on (2016 ), 1\u201312, Accessed : Jun. 30, 2017 . [Online]. Available: http:\/\/ieeexplore.ieee.org\/abstract\/document\/7783704\/. B. Panda and A. Seznec. 2016. Dictionary sharing: An efficient cache compression scheme for compressed caches. In Microarchitecture (MICRO), 2016 49th Annual IEEE\/ACM International Symposium on (2016), 1\u201312, Accessed: Jun. 30, 2017. [Online]. Available: http:\/\/ieeexplore.ieee.org\/abstract\/document\/7783704\/."},{"key":"e_1_2_1_7_1","volume-title":"Accessed","author":"Alameldeen A. R.","year":"2004","unstructured":"A. R. Alameldeen and D. A. Wood . 2004 . Frequent pattern compression: A significance-based compression scheme for L2 caches. Dept Comp Scie Univ Wis .-Madison Tech Rep 1500, Accessed : Jul. 02, 2017. [Online]. Available: http:\/\/ftp.cs.wisc.edu\/pub\/techreports\/2004\/TR1500.pdf. A. R. Alameldeen and D. A. Wood. 2004. Frequent pattern compression: A significance-based compression scheme for L2 caches. Dept Comp Scie Univ Wis.-Madison Tech Rep 1500, Accessed: Jul. 02, 2017. [Online]. Available: http:\/\/ftp.cs.wisc.edu\/pub\/techreports\/2004\/TR1500.pdf."},{"key":"#cr-split#-e_1_2_1_8_1.1","doi-asserted-by":"crossref","unstructured":"A. Arelakis F. Dahlgren and P. Stenstrom. 2015. HyComp: A hybrid cache compression method for selection of data-type-specific compression methods (date?) 38-49 DOI:10.1145\/2830772.2830823 10.1145\/2830772.2830823","DOI":"10.1145\/2830772.2830823"},{"key":"#cr-split#-e_1_2_1_8_1.2","doi-asserted-by":"crossref","unstructured":"A. Arelakis F. Dahlgren and P. Stenstrom. 2015. HyComp: A hybrid cache compression method for selection of data-type-specific compression methods (date?) 38-49 DOI:10.1145\/2830772.2830823","DOI":"10.1145\/2830772.2830823"},{"key":"e_1_2_1_9_1","volume-title":"2015 48th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO)","author":"Nguyen T. M.","year":"2015","unstructured":"T. M. Nguyen and D. Wentzlaff . 2015. MORC: A manycore-oriented compressed cache . In 2015 48th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO) ( 2015 ), 76\u201388, DOI:10.1145\/2830772.2830828 10.1145\/2830772.2830828 T. M. Nguyen and D. Wentzlaff. 2015. MORC: A manycore-oriented compressed cache. In 2015 48th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO) (2015), 76\u201388, DOI:10.1145\/2830772.2830828"},{"key":"e_1_2_1_10_1","volume-title":"Practical data compression for modern memory hierarchies","author":"Pekhimenko G.","year":"2016","unstructured":"G. Pekhimenko . 2016. Practical data compression for modern memory hierarchies . Carnegie Mellon University ( 2016 ) G. Pekhimenko. 2016. Practical data compression for modern memory hierarchies. Carnegie Mellon University (2016)"},{"key":"e_1_2_1_11_1","volume-title":"Proceedings of the 47th Annual IEEE\/ACM International Symposium on Microarchitecture (date), 331\u2013342","author":"Sardashti S.","unstructured":"S. Sardashti , A. Seznec , and D. A. Wood . 2014. Skewed compressed caches . In Proceedings of the 47th Annual IEEE\/ACM International Symposium on Microarchitecture (date), 331\u2013342 . S. Sardashti, A. Seznec, and D. A. Wood. 2014. Skewed compressed caches. In Proceedings of the 47th Annual IEEE\/ACM International Symposium on Microarchitecture (date), 331\u2013342."},{"key":"e_1_2_1_12_1","article-title":"Yet another compressed cache: a low-cost yet effective compressed cache","volume":"13","author":"Sardashti S.","year":"2016","unstructured":"S. Sardashti , A. Seznec , and D. A. Wood . 2016 . Yet another compressed cache: a low-cost yet effective compressed cache . ACM Trans Arch. Code Optim 13 , 3 (2016), 27:1\u201327:25, Sep. 2016, DOI:10.1145\/2976740 10.1145\/2976740 S. Sardashti, A. Seznec, and D. A. Wood. 2016. Yet another compressed cache: a low-cost yet effective compressed cache. ACM Trans Arch. Code Optim 13, 3 (2016), 27:1\u201327:25, Sep. 2016, DOI:10.1145\/2976740","journal-title":"ACM Trans Arch. Code Optim"},{"key":"e_1_2_1_13_1","volume-title":"Proceedings of the 23rd International Conference on Supercomputing 46\u201355","author":"Dusser J.","year":"2017","unstructured":"J. Dusser , T. Piquet , and A. Seznec . 2009. Zero-content augmented caches . In Proceedings of the 23rd International Conference on Supercomputing 46\u201355 , Accessed : Jul. 02, 2017 . [Online]. Available: http:\/\/dl.acm.org\/citation.cfm?id=1542288. J. Dusser, T. Piquet, and A. Seznec. 2009. Zero-content augmented caches. In Proceedings of the 23rd International Conference on Supercomputing 46\u201355, Accessed: Jul. 02, 2017. [Online]. Available: http:\/\/dl.acm.org\/citation.cfm?id=1542288."},{"key":"e_1_2_1_14_1","volume-title":"Enabling Transparent Memory-Compression for Commodity Memory Systems. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA). 570--581","author":"Young V.","unstructured":"V. Young , S. Kariyappa , and M. K. Qureshi . 2019 . Enabling Transparent Memory-Compression for Commodity Memory Systems. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA). 570--581 . DOI:10.1109\/HPCA.2019.00010 10.1109\/HPCA.2019.00010 V. Young, S. Kariyappa, and M. K. Qureshi. 2019. Enabling Transparent Memory-Compression for Commodity Memory Systems. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA). 570--581. DOI:10.1109\/HPCA.2019.00010"},{"key":"e_1_2_1_15_1","doi-asserted-by":"crossref","unstructured":"B. Abali et al. 2001. Memory expansion technology (MXT): Software support and performance. IBM J. Res. Dev 2001.  B. Abali et al. 2001. Memory expansion technology (MXT): Software support and performance. IBM J. Res. Dev 2001.","DOI":"10.1147\/rd.452.0287"},{"key":"e_1_2_1_16_1","volume-title":"High Performance Computer Architecture (HPCA), 2014 IEEE 20th International Symposium on 638\u2013649","author":"Shafiee A.","year":"2016","unstructured":"A. Shafiee , M. Taassori , R. Balasubramonian , and A. K. Davis . 2014. MemZip: Exploring unconventional benefits from memory compression . In High Performance Computer Architecture (HPCA), 2014 IEEE 20th International Symposium on 638\u2013649 , Accessed : May 22, 2016 . [Online]. Available: http:\/\/ieeexplore.ieee.org\/xpls\/abs_all.jsp?arnumber=6835972. A. Shafiee, M. Taassori, R. Balasubramonian, and A. K. Davis. 2014. MemZip: Exploring unconventional benefits from memory compression. In High Performance Computer Architecture (HPCA), 2014 IEEE 20th International Symposium on 638\u2013649, Accessed: May 22, 2016. [Online]. Available: http:\/\/ieeexplore.ieee.org\/xpls\/abs_all.jsp?arnumber=6835972."},{"key":"e_1_2_1_17_1","volume-title":"Accessed","author":"Kim S.","year":"2017","unstructured":"S. Kim , S. Lee , T. Kim , and J. Huh . 2017. Transparent dual memory architecture. Presented at the PACT\u201917 , Accessed : Oct. 07, 2017 . [Online]. Available : http:\/\/calab.kaist.ac.kr:8080\/\u223cjhuh\/papers\/kim_pact17.pdf. S. Kim, S. Lee, T. Kim, and J. Huh. 2017. Transparent dual memory architecture. Presented at the PACT\u201917, Accessed: Oct. 07, 2017. [Online]. Available: http:\/\/calab.kaist.ac.kr:8080\/\u223cjhuh\/papers\/kim_pact17.pdf."},{"key":"e_1_2_1_18_1","unstructured":"M. Rhu M. O'Connor N. Chatterjee J. Pool and S. W. Keckler. 2017. Compressing DMA engine: leveraging activation sparsity for training deep neural networks. ArXiv170501626 Cs May 2017 [Online]. Available: http:\/\/arxiv.org\/abs\/1705.01626.  M. Rhu M. O'Connor N. Chatterjee J. Pool and S. W. Keckler. 2017. Compressing DMA engine: leveraging activation sparsity for training deep neural networks. ArXiv170501626 Cs May 2017 [Online]. Available: http:\/\/arxiv.org\/abs\/1705.01626."},{"key":"e_1_2_1_19_1","volume-title":"Proceedings of the 43rd annual Design Automation Conference 701\u2013704","author":"Yang L.","year":"2017","unstructured":"L. Yang , H. Lekatsas , and R. P. Dick . 2006. High-performance operating system controlled memory compression . In Proceedings of the 43rd annual Design Automation Conference 701\u2013704 , Accessed : Jul. 02, 2017 . [Online]. Available: http:\/\/dl.acm.org\/citation.cfm?id=1147086. L. Yang, H. Lekatsas, and R. P. Dick. 2006. High-performance operating system controlled memory compression. In Proceedings of the 43rd annual Design Automation Conference 701\u2013704, Accessed: Jul. 02, 2017. [Online]. Available: http:\/\/dl.acm.org\/citation.cfm?id=1147086."},{"key":"e_1_2_1_20_1","volume-title":"Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques 325\u2013334","author":"Sathish V.","year":"2017","unstructured":"V. Sathish , M. J. Schulte , and N. S. Kim . 2012. Lossless and lossy memory I\/O link compression for improving performance of GPGPU workloads . In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques 325\u2013334 , Accessed : Jun. 30, 2017 . [Online]. Available: http:\/\/dl.acm.org\/citation.cfm?id=2370864. V. Sathish, M. J. Schulte, and N. S. Kim. 2012. Lossless and lossy memory I\/O link compression for improving performance of GPGPU workloads. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques 325\u2013334, Accessed: Jun. 30, 2017. [Online]. Available: http:\/\/dl.acm.org\/citation.cfm?id=2370864."},{"key":"e_1_2_1_21_1","volume-title":"Proceedings of the 43rd International Symposium on Computer Architecture","author":"Kim J.","year":"2016","unstructured":"J. Kim , M. Sullivan , E. Choukse , and M. Erez . 2016. Bit-plane compression: transforming data for better compression in many-core architectures . In Proceedings of the 43rd International Symposium on Computer Architecture , Piscataway, NJ, USA , 2016 , 329\u2013340, DOI:10.1109\/ISCA.2016.37 10.1109\/ISCA.2016.37 J. Kim, M. Sullivan, E. Choukse, and M. Erez. 2016. Bit-plane compression: transforming data for better compression in many-core architectures. In Proceedings of the 43rd International Symposium on Computer Architecture, Piscataway, NJ, USA, 2016, 329\u2013340, DOI:10.1109\/ISCA.2016.37"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/3373376.3378518"},{"key":"e_1_2_1_23_1","volume-title":"Oct. 25, 2019","author":"Lattice","year":"2019","unstructured":"Lattice (order). 2019 . Wikipedia . Oct. 25, 2019 , Accessed: Nov. 23, 2019. [Online]. Available : https:\/\/en.wikipedia.org\/w\/index.php?title=Lattice_(order)&oldid=922960068. Lattice (order). 2019. Wikipedia. Oct. 25, 2019, Accessed: Nov. 23, 2019. [Online]. Available: https:\/\/en.wikipedia.org\/w\/index.php?title=Lattice_(order)&oldid=922960068."},{"key":"e_1_2_1_24_1","volume-title":"Artificial Intelligence: A Modern Approach","author":"Russel S.","unstructured":"S. Russel and P. Norvig . Artificial Intelligence: A Modern Approach , Third Edition. S. Russel and P. Norvig. Artificial Intelligence: A Modern Approach, Third Edition."},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/1186736.1186737"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.2172\/1059462"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/IISWC.2009.5306797"},{"key":"e_1_2_1_28_1","doi-asserted-by":"crossref","unstructured":"C. E. Shannon. A mathematical theory of communication. Bell Labs Tech. J. 27 3 55.  C. E. Shannon. A mathematical theory of communication. Bell Labs Tech. J. 27 3 55.","DOI":"10.1002\/j.1538-7305.1948.tb01338.x"},{"key":"e_1_2_1_29_1","volume-title":"Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems","author":"Tsai P.-A.","unstructured":"P.-A. Tsai and D. Sanchez . 2019. Compress Objects, Not Cache Lines: An Object-Based Compressed Memory Hierarchy . In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems . New York, NY, 229--242. DOI:10.1145\/3297858.3304006 10.1145\/3297858.3304006 P.-A. Tsai and D. Sanchez. 2019. Compress Objects, Not Cache Lines: An Object-Based Compressed Memory Hierarchy. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. New York, NY, 229--242. DOI:10.1145\/3297858.3304006"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3462209","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3462209","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:48:53Z","timestamp":1750193333000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3462209"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,9,3]]},"references-count":30,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2021,12,31]]}},"alternative-id":["10.1145\/3462209"],"URL":"https:\/\/doi.org\/10.1145\/3462209","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"value":"1544-3566","type":"print"},{"value":"1544-3973","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,9,3]]},"assertion":[{"value":"2020-09-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-04-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-09-03","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}