{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T05:06:11Z","timestamp":1750309571763,"version":"3.41.0"},"reference-count":41,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2025,3,21]],"date-time":"2025-03-21T00:00:00Z","timestamp":1742515200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"National NSF of China","award":["62141218"],"award-info":[{"award-number":["62141218"]}]},{"name":"Shanghai Key Laboratory of Scalable Computing and Systems"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2025,3,31]]},"abstract":"<jats:p>\n            Today, GPUs significantly boost rendering performance. However, the high memory requirements limit their use, especially on low-end mobile platforms. Compression techniques have been widely adopted to reduce memory consumption but face two primary issues when applied to mobile GPUs: (1)\n            <jats:italic>low repetition ratio<\/jats:italic>\n            caused by small raw data sizes and concurrency, and (2)\n            <jats:italic>low locality<\/jats:italic>\n            caused by unpredictable rendering behaviors. These two limitations result in a low compression ratio when compressors are applied to low-end mobile devices.\n          <\/jats:p>\n          <jats:p>\n            This article introduces\n            <jats:italic>gCom<\/jats:italic>\n            , a fine-grained rendering compressor accelerated by GPUs. To improve the compression ratio,\n            <jats:italic>gCom<\/jats:italic>\n            incorporates the following innovations. First, unlike other compression techniques that use frames or tiles as basic processing units,\n            <jats:italic>gCom<\/jats:italic>\n            is the first to employ a fine-grained processing unit (i.e., the color channel), enhancing repetition amplification without increasing raw data. Second,\n            <jats:italic>gCom<\/jats:italic>\n            introduces two key features\u2014\n            <jats:italic>Hierarchical Delta<\/jats:italic>\n            and\n            <jats:italic>Channel Decorrelator<\/jats:italic>\n            \u2014which maximize the locality of adjacent channels and reduce raw data size. Third, to maintain the original GPU throughput,\n            <jats:italic>gCom<\/jats:italic>\n            revolutionizes the Golomb-Rice algorithm and proposes a new compression approach, the Parallel-Oriented Golomb-Rice algorithm, enabling parallel execution of both decompression and compression processes. The entire design of\n            <jats:italic>gCom<\/jats:italic>\n            utilizes only idle resources and existing commands on mobile GPUs, thus keeping purchasing costs low.\n          <\/jats:p>\n          <jats:p>\n            To date,\n            <jats:italic>gCom<\/jats:italic>\n            has improved the channel locality by nearly 50%. The best compression achievement received by\n            <jats:italic>gCom<\/jats:italic>\n            has reached around 20%.\n          <\/jats:p>","DOI":"10.1145\/3711819","type":"journal-article","created":{"date-parts":[[2025,1,9]],"date-time":"2025-01-09T03:44:37Z","timestamp":1736394277000},"page":"1-25","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["gCom: Fine-grained Compressors in Graphics Memory of Mobile GPU"],"prefix":"10.1145","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0009-0006-7178-5097","authenticated-orcid":false,"given":"Dongjie","family":"Tang","sequence":"first","affiliation":[{"name":"Shanghai Custom College, Shanghai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-9552-9357","authenticated-orcid":false,"given":"Zijun","family":"Wu","sequence":"additional","affiliation":[{"name":"Shanghai Custom College, Shanghai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-7969-6891","authenticated-orcid":false,"given":"Yun","family":"Wang","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University, Shanghai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-7819-5667","authenticated-orcid":false,"given":"Yicheng","family":"Gu","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University, Shanghai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8769-293X","authenticated-orcid":false,"given":"Fangxin","family":"Liu","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University School of Electronic Information and Electrical Engineering, Shanghai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2730-2319","authenticated-orcid":false,"given":"Zhengwei","family":"Qi","sequence":"additional","affiliation":[{"name":"School of Software, Shanghai Jiao Tong University, Shanghai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,3,21]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.1145\/1142473.1142548"},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2019.00015"},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2019.00014"},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1145\/3373376.3378498"},{"key":"e_1_3_2_6_2","first-page":"499","volume-title":"Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI \u201920)","author":"Bai Zhihao","year":"2020","unstructured":"Zhihao Bai, Zhen Zhang, Yibo Zhu, and Xin Jin. 2020. PipeSwitch: Fast pipelined context switching for deep learning applications. In Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI \u201920). 499\u2013514. https:\/\/www.usenix.org\/conference\/osdi20\/presentation\/bai"},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.14778\/1920841.1920927"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2015.46"},{"key":"e_1_3_2_9_2","volume-title":"Proceedings of the 2019 USENIX Annual Technical Conference (USENIX ATC \u201919)","author":"Gao Xiang","year":"2019","unstructured":"Xiang Gao, Mingkai Dong, Xie Miao, Wei Du, Chao Yu, and Haibo Chen. 2019. EROFS: A compression-friendly readonly file system for resource-scarce devices. In Proceedings of the 2019 USENIX Annual Technical Conference (USENIX ATC \u201919). 149\u2013162. https:\/\/www.usenix.org\/conference\/atc19\/presentation\/gao"},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.1998.655800"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIT.1966.1053907"},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.1145\/3570361.3592530"},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1145\/3068281"},{"key":"e_1_3_2_14_2","volume-title":"Proceedings of the 2019 USENIX Annual Technical Conference (USENIX ATC \u201919)","author":"Hu Xiaokang","year":"2019","unstructured":"Xiaokang Hu, Fuzong Wang, Weigang Li, Jian Li, and Haibing Guan. 2019. QZFS: QAT accelerated compression in file system for application agnostic and cost efficient data storage. In Proceedings of the 2019 USENIX Annual Technical Conference (USENIX ATC \u201919). 163\u2013176. https:\/\/www.usenix.org\/conference\/atc19\/presentation\/hu-xiaokang"},{"key":"e_1_3_2_15_2","volume-title":"Proceedings of the 19th USENIX Conference on File and Storage Technologies (FAST \u201921)","author":"Ji Cheng","year":"2021","unstructured":"Cheng Ji, Li-Pin Chang, Riwei Pan, Chao Wu, Congming Gao, Liang Shi, Tei-Wei Kuo, and Chun Jason Xue. 2021. Pattern-guided file compression with user-experience enhancement for log-structured file system on mobile devices. In Proceedings of the 19th USENIX Conference on File and Storage Technologies (FAST \u201921). 127\u2013140. https:\/\/www.usenix.org\/conference\/fast21\/presentation\/ji"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1145\/3352460.3358286"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1016\/S1383-7621(00)00035-7"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1145\/3372224.3419214"},{"key":"e_1_3_2_19_2","article-title":"Decoding billions of integers per second through vectorization","volume":"1209","author":"Lemire Daniel","year":"2012","unstructured":"Daniel Lemire and Leonid Boytsov. 2012. Decoding billions of integers per second through vectorization. CoRR abs\/1209.2137 (2012). http:\/\/arxiv.org\/abs\/1209.2137","journal-title":"CoRR"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.14778\/3007328.3007331"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1016\/J.SYSARC.2022.102627"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/DAC18072.2020.9218517"},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNET.2015.2450254"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1145\/3503222.3507745"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICPADS.2012.16"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1145\/3445814.3446722"},{"key":"e_1_3_2_27_2","volume-title":"Proceedings of the 2018 USENIX Annual Technical Conference (USENIX ATC \u201918)","author":"Pekhimenko Gennady","year":"2018","unstructured":"Gennady Pekhimenko, Chuanxiong Guo, Myeongjae Jeon, Peng Huang, and Lidong Zhou. 2018. TerseCades: Efficient data compression in stream processing. In Proceedings of the 2018 USENIX Annual Technical Conference (USENIX ATC \u201918). 307\u2013320. https:\/\/www.usenix.org\/conference\/atc18\/presentation\/pekhimenko"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1145\/3579371.3589057"},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA51647.2021.00065"},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1145\/163090.163096"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1145\/3576915.3623124"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1145\/3514221.3526132"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1145\/3466752.3480046"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/83.855427"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2019.00042"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1145\/3373376.3378466"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.1145\/3579371.3589077"},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.1145\/3627703.3629557"},{"key":"e_1_3_2_39_2","volume-title":"Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI \u201920)","author":"Zhao Hanyu","year":"2020","unstructured":"Hanyu Zhao, Zhenhua Han, Zhi Yang, Quanlu Zhang, Fan Yang, Lidong Zhou, Mao Yang, Francis C. M. Lau, Yuqi Wang, Yifan Xiong, et al. 2020. HiveD: Sharing a GPU cluster for deep learning with guarantees. In Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI \u201920). 515\u2013532. https:\/\/www.usenix.org\/conference\/osdi20\/presentation\/zhao-hanyu"},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDCS.2018.00077"},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/TRUSTCOM.2014.90"},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2006.150"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3711819","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3711819","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:19:15Z","timestamp":1750295955000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3711819"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,3,21]]},"references-count":41,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2025,3,31]]}},"alternative-id":["10.1145\/3711819"],"URL":"https:\/\/doi.org\/10.1145\/3711819","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2025,3,21]]},"assertion":[{"value":"2024-09-04","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-12-12","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-03-21","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}