{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,25]],"date-time":"2025-12-25T12:38:36Z","timestamp":1766666316866,"version":"3.41.0"},"reference-count":58,"publisher":"Association for Computing Machinery (ACM)","issue":"2","funder":[{"name":"Key Program of the National Natural Science Foundation of China","award":["62032001"],"award-info":[{"award-number":["62032001"]}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62472436"],"award-info":[{"award-number":["62472436"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Foundation of Key Laboratory","award":["WDZC20245250110"],"award-info":[{"award-number":["WDZC20245250110"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2025,6,30]]},"abstract":"<jats:p>This article finds that the reusability of vertices in the same graph in graph processing differs, and the high-reuse and low-reuse vertices are stored together. These phenomena lead to the inability of existing GPU architectures to capture the reusability of graph processing. The most advanced cache optimization strategies cannot implement different management strategies for data with different reusability, which is an essential reason for graph processing\u2019s poor performance. Therefore, we propose a TSN cache scheme for the GPU platform. This scheme employs distinct management strategies for data with varying reusability in the cache, effectively leveraging the locality of these different data types. In addition, the TSN cache scheme can also reduce the probability of cache thrashing caused by low-reuse data. This article evaluates multiple graph algorithms and datasets and shows that the TSN cache scheme achieves an average speedup of 1.38 compared with the baseline scheme.<\/jats:p>","DOI":"10.1145\/3721286","type":"journal-article","created":{"date-parts":[[2025,3,4]],"date-time":"2025-03-04T11:22:22Z","timestamp":1741087342000},"page":"1-26","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["TSN Cache: Exploiting Data Localities in Graph Computing Applications"],"prefix":"10.1145","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0009-0003-8054-9131","authenticated-orcid":false,"given":"Chaoyang","family":"Jia","sequence":"first","affiliation":[{"name":"National University of Defense Technology","place":["Changsha, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6107-4124","authenticated-orcid":false,"given":"Jingyu","family":"Liu","sequence":"additional","affiliation":[{"name":"National University of Defense Technology","place":["Changsha, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-0009-2624","authenticated-orcid":false,"given":"Shi","family":"Chen","sequence":"additional","affiliation":[{"name":"National University of Defense Technology","place":["Changsha, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6378-7002","authenticated-orcid":false,"given":"Kai","family":"Lu","sequence":"additional","affiliation":[{"name":"National University of Defense Technology","place":["Changsha, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9043-2998","authenticated-orcid":false,"given":"Li","family":"Shen","sequence":"additional","affiliation":[{"name":"Department of Computing Science, National University of Defense Technology","place":["Changsha, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,7]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.1145\/1383422.1383443"},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2009.4919648"},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA51647.2021.00062"},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2019.00051"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1145\/3159652.3159731"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2016.7498258"},{"key":"e_1_3_1_8_2","article-title":"On compressing social networks","author":"Chierichetti Flavio","year":"2009","unstructured":"Flavio Chierichetti, Ravi Kumar, Silvio Lattanzi, Michael Mitzenmacher, and Prabhakar Raghavan. 2009. On compressing social networks. ACM (2009).","journal-title":"ACM"},{"issue":"99","key":"e_1_3_1_9_2","first-page":"1","article-title":"NVIDIA A100 tensor core GPU: Performance and innovation","author":"Choquette Jack","year":"2021","unstructured":"Jack Choquette, Wishwesh Gandhi, Olivier Giroux, Nick Stam, and Ronny Krashinsky. 2021. NVIDIA A100 tensor core GPU: Performance and innovation. IEEE Micro PP, 99 (2021), 1\u20131.","journal-title":"IEEE Micro"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1145\/2049662.2049663"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9780511761942"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1109\/IISWC47752.2019.9041948"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA47549.2020.00028"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/PACT.2017.32"},{"key":"e_1_3_1_15_2","first-page":"17","volume-title":"Presented as part of the 10th  \\(\\lbrace\\) USENIX \\(\\rbrace\\)  Symposium on Operating Systems Design and Implementation ( \\(\\lbrace\\) OSDI \\(\\rbrace\\) \u201912)","author":"Gonzalez Joseph E.","year":"2012","unstructured":"Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. Powergraph: Distributed graph-parallel computation on natural graphs. In Presented as part of the 10th \\(\\lbrace\\) USENIX \\(\\rbrace\\) Symposium on Operating Systems Design and Implementation ( \\(\\lbrace\\) OSDI \\(\\rbrace\\) \u201912). 17\u201330."},{"key":"e_1_3_1_16_2","volume-title":"GPGPU-Sim Overview.","author":"Hughes Clayton","year":"2019","unstructured":"Clayton Hughes, Roland Green, Gwendolyn Renae Voskuilen, Mengchi Zhang, and Tim Rogers. 2019. GPGPU-Sim Overview.Technical Report. Sandia National Lab.(SNL-NM), Albuquerque, NM (United States)."},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2016.17"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2018.00020"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1145\/1816038.1815971"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1145\/3123939.3123942"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2010.24"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1145\/3093336.3037701"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1145\/1150402.1150476"},{"key":"e_1_3_1_24_2","unstructured":"Conrad Lee Fergal Reid Aaron McDaid and Neil Hurley. 2010. Detecting highly overlapping community structure by greedy clique expansion. arXiv preprint arXiv:1002.1827 (2010)."},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1631\/FITEE.1900127"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","unstructured":"Yashuai L\u00fc Hui Guo Libo Huang Qi Yu Li Shen Nong Xiao and Zhiying Wang. 2021. GraphPEG: Accelerating graph processing on GPUs. ACM Trans. Archit. Code Optim. 18 3 (May 2021). DOI:10.1145\/3450440","DOI":"10.1145\/3450440"},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1145\/3064176.3064191"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2022.3157525"},{"key":"e_1_3_1_29_2","unstructured":"Naveen Muralimanohar Rajeev Balasubramonian and Norman P. Jouppi. 2009. CACTI 6.0: A tool to understand large caches. University of Utah and Hewlett Packard Laboratories Tech. Rep. 147 (2009)."},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1145\/2807591.2807626"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1145\/2517349.2522739"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS49936.2021.00016"},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2016.7783704"},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.1145\/3297858.3304064"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2014.50"},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.1145\/3457207"},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1145\/1345206.1345220"},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2014.41"},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1145\/2976740"},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA51647.2021.00033"},{"key":"e_1_3_1_41_2","first-page":"558","volume-title":"Proceedings of the 2022 IEEE International Symposium on High-Performance Computer Architecture","author":"Shah Ishan","year":"2022","unstructured":"Ishan Shah, Akanksha Jain, and Calvin Lin. 2022. Effective mimicry of Belady\u2019s MIN policy. In Proceedings of the 2022 IEEE International Symposium on High-Performance Computer Architecture. IEEE, 558\u2013572."},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1145\/3352460.3358319"},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2016.0058"},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.1145\/2442516.2442530"},{"key":"e_1_3_1_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICBAIE56435.2022.9985846"},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2016.7446065"},{"key":"e_1_3_1_47_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2016.7783705"},{"key":"e_1_3_1_48_2","unstructured":"John A. Stratton Christopher Rodrigues I-Jui Sung Nady Obeid Li-Wen Chang Nasser Anssari Geng Daniel Liu and Wen-mei W. Hwu. 2012. Parboil: A revised benchmark suite for scientific and commercial throughput computing. Center for Reliable and High-Performance Computing 127 7.2 (2012)."},{"key":"e_1_3_1_49_2","doi-asserted-by":"publisher","DOI":"10.1109\/LCA.2017.2762660"},{"key":"e_1_3_1_50_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2018.00027"},{"key":"e_1_3_1_51_2","doi-asserted-by":"publisher","DOI":"10.1145\/3093337.3037744"},{"key":"e_1_3_1_52_2","first-page":"763","volume-title":"Proceedings of the OSDI","author":"Wang Kai","year":"2018","unstructured":"Kai Wang, Zhiqiang Zuo, John Thorpe, Tien Quang Nguyen, and Guoqing Harry Xu. 2018. RStream: Marrying relational algebra with streaming for efficient graph mining on a single machine.. In Proceedings of the OSDI. 763\u2013782."},{"key":"e_1_3_1_53_2","doi-asserted-by":"publisher","DOI":"10.1145\/2155620.2155671"},{"key":"e_1_3_1_54_2","doi-asserted-by":"publisher","DOI":"10.5555\/3433701.3433815"},{"key":"e_1_3_1_55_2","doi-asserted-by":"publisher","DOI":"10.1021\/acs.jproteome.0c00316"},{"key":"e_1_3_1_56_2","doi-asserted-by":"publisher","DOI":"10.1007\/S11704-020-9485-2"},{"key":"e_1_3_1_57_2","doi-asserted-by":"publisher","unstructured":"Dunbo Zhang Qingjie Lang Ruoxi Wang and Li Shen. 2024. Extension VM: Interleaved data layout in vector memory. ACM Trans. Archit. Code Optim. 21 1 (February 2024). DOI:10.1145\/3631528","DOI":"10.1145\/3631528"},{"key":"e_1_3_1_58_2","doi-asserted-by":"publisher","DOI":"10.1145\/3289604"},{"key":"e_1_3_1_59_2","first-page":"441","volume-title":"Proceedings of the 2018 USENIX Annual Technical Conference, USENIX ATC 2018, Boston, MA, USA, July 11\u201313, 2018.","author":"Zhang Yu","year":"2018","unstructured":"Yu Zhang, Xiaofei Liao, Hai Jin, Lin Gu, Ligang He, Bingsheng He, and Haikun Liu. 2018. CGraph: A correlations-aware approach for efficient concurrent iterative graph processing. In Proceedings of the 2018 USENIX Annual Technical Conference, USENIX ATC 2018, Boston, MA, USA, July 11\u201313, 2018.Haryadi S. Gunawi and Benjamin C. Reed (Eds.), USENIX Association, 441\u2013452. Retrieved from https:\/\/www.usenix.org\/conference\/atc18\/presentation\/zhang-yu"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3721286","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,1]],"date-time":"2025-07-01T12:32:33Z","timestamp":1751373153000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3721286"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,30]]},"references-count":58,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2025,6,30]]}},"alternative-id":["10.1145\/3721286"],"URL":"https:\/\/doi.org\/10.1145\/3721286","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2025,6,30]]},"assertion":[{"value":"2024-05-29","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-02-09","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-07-01","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}