{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,10]],"date-time":"2026-04-10T20:58:06Z","timestamp":1775854686989,"version":"3.50.1"},"reference-count":64,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2025,5,27]],"date-time":"2025-05-27T00:00:00Z","timestamp":1748304000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Joint Key Funds of National Natural Science Foundation of China","award":["U23A20302"],"award-info":[{"award-number":["U23A20302"]}]},{"name":"China Postdoctoral Science Foundation under Grant","award":["2024M761806"],"award-info":[{"award-number":["2024M761806"]}]},{"name":"Natural Science Foundation of Shandong Province, China","award":["ZR2022ZD02"],"award-info":[{"award-number":["ZR2022ZD02"]}]},{"DOI":"10.13039\/501100006374","name":"NSF","doi-asserted-by":"publisher","award":["MRI2018627, CCF-2005884, CCF-2210753, CCF-2312507, OAC2310510"],"award-info":[{"award-number":["MRI2018627, CCF-2005884, CCF-2210753, CCF-2312507, OAC2310510"]}],"id":[{"id":"10.13039\/501100006374","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Meas. Anal. Comput. Syst."],"published-print":{"date-parts":[[2025,5,27]]},"abstract":"<jats:p>The emerging Ray-tracing cores on GPUs have been repurposed for non-ray-tracing tasks by researchers recently. In this paper, we explore the benefits and effectiveness of executing graph algorithms on RT cores. We re-design breadth-first search and triangle counting on the new hardware as graph algorithm representatives. Our implementations focus on how to convert the graph operations to bounding volume hierarchy construction and ray generation, which are computational paradigms specific to ray tracing. We evaluate our RT-based methods on a wide range of real-world datasets. The results do not show the advantage of the RT-based methods over CUDA-based methods. We extend the experiments to the set intersection workload on synthesized datasets, and the RT-based method shows superior performance when the skew ratio is high. By carefully comparing the RT-based and CUDA-based binary search, we discover that RT cores are more efficient at searching for elements, but this comes with a constant and non-trivial overhead of the execution pipeline. Furthermore, the overhead of BVH construction is substantially higher than sorting on CUDA cores for large datasets. Our case studies unveil several rules of adapting graph algorithms to ray-tracing cores that might benefit future evolution of the emerging hardware towards general-computing tasks.<\/jats:p>","DOI":"10.1145\/3727108","type":"journal-article","created":{"date-parts":[[2025,6,4]],"date-time":"2025-06-04T09:43:35Z","timestamp":1749030215000},"page":"1-25","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["A Case Study for Ray Tracing Cores: Performance Insights with Breadth-First Search and Triangle Counting in Graphs"],"prefix":"10.1145","volume":"9","author":[{"ORCID":"https:\/\/orcid.org\/0009-0004-7296-9504","authenticated-orcid":false,"given":"Zhixiong","family":"Xiao","sequence":"first","affiliation":[{"name":"Shandong University, Qingdao, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6305-8125","authenticated-orcid":false,"given":"Mengbai","family":"Xiao","sequence":"additional","affiliation":[{"name":"Shandong University, Qingdao, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3976-0870","authenticated-orcid":false,"given":"Yuan","family":"Yuan","sequence":"additional","affiliation":[{"name":"Shandong University, Qingdao, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6835-5981","authenticated-orcid":false,"given":"Dongxiao","family":"Yu","sequence":"additional","affiliation":[{"name":"Shandong University, Qingdao, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-3588-0193","authenticated-orcid":false,"given":"Rubao","family":"Lee","sequence":"additional","affiliation":[{"name":"Freelance, Columbus, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3411-3612","authenticated-orcid":false,"given":"Xiaodong","family":"Zhang","sequence":"additional","affiliation":[{"name":"The Ohio State University, Columbus, USA"}]}],"member":"320","published-online":{"date-parts":[[2025,6,3]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"The GraphBLAS. Retrieved","year":"2024","unstructured":"2017. The GraphBLAS. Retrieved Oct 16, 2024 from https:\/\/graphblas.org\/"},{"key":"e_1_2_1_2_1","volume-title":"Extending GPU Ray-Tracing Units for Hierarchical Search Acceleration. In 2024 57th Annual IEEE\/ACM International Symposium on Microarchitecture.","author":"Barnes Aaron","unstructured":"Aaron Barnes, Fangjia Shen, and Timothy G. Rogers. 2024. Extending GPU Ray-Tracing Units for Hierarchical Search Acceleration. In 2024 57th Annual IEEE\/ACM International Symposium on Microarchitecture."},{"key":"e_1_2_1_3_1","volume-title":"C","author":"Bellas Christos","year":"2022","unstructured":"Christos Bellas and Anastasios Gounaris. 2022. Exploiting GPUs for Fast Intersection of Large Sets. Information Systems 108, C (2022)."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/3018743.3018756"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/361002.361007"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2017.2735405"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2020.2971677"},{"key":"e_1_2_1_8_1","volume-title":"Efficient and Scalable Graph Pattern Mining on GPUs. In 16th USENIX Symposium on Operating Systems Design and Implementation. 857--877","author":"Chen Xuhao","year":"2022","unstructured":"Xuhao Chen and Arvind. 2022. Efficient and Scalable Graph Pattern Mining on GPUs. In 16th USENIX Symposium on Operating Systems Design and Implementation. 857--877."},{"key":"e_1_2_1_9_1","volume-title":"Retrieved","author":"NVIDIA Corporation","year":"2020","unstructured":"NVIDIA Corporation. 2020. NVIDIA Ampere GA102 GPU Architecture whitepaper. Retrieved Dec 25, 2023 from https:\/\/www.nvidia.com\/content\/PDF\/nvidia-ampere-ga-102-gpu-architecture-whitepaper-v2.pdf"},{"key":"e_1_2_1_10_1","volume-title":"Retrieved","author":"NVIDIA Corporation","year":"2023","unstructured":"NVIDIA Corporation. 2023. NVIDIA Ada GPU Architecture whitepaper. Retrieved Oct 16, 2024 from https:\/\/images.nvidia.com\/aem-dam\/Solutions\/Data-Center\/l4\/nvidia-ada-gpu-architecture-whitepaper-v2.1.pdf"},{"key":"e_1_2_1_11_1","unstructured":"NVIDIA Corporation. 2023. NVIDIA OptiX 7.5 Programming Guide. Retrieved Dec 5 2023 from https:\/\/raytracing-docs.nvidia.com\/optix7\/guide\/index.html#preface#preface"},{"key":"e_1_2_1_12_1","volume-title":"Nsight Compute Documentation. Retrieved","author":"NVIDIA Corporation","year":"2024","unstructured":"NVIDIA Corporation. 2024. Nsight Compute Documentation. Retrieved Mar 5, 2024 from https:\/\/docs.nvidia.com\/nsight-compute\/index.html"},{"key":"e_1_2_1_13_1","volume-title":"Retrieved","author":"NVIDIA Corporation","year":"2025","unstructured":"NVIDIA Corporation. 2025. Nsight Systems Documentation. Retrieved Mar 27, 2025 from https:\/\/docs.nvidia.com\/nsight-systems\/index.html"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/2049662.2049663"},{"key":"e_1_2_1_15_1","volume-title":"Henry Moreton and Brandon Bell","author":"Emmett Kilgariff Nick Stam","year":"2018","unstructured":"Nick Stam Emmett Kilgariff, Henry Moreton and Brandon Bell. 2018. NVIDIA Turing Architecture In-Depth. Retrieved Jan 5, 2024 from https:\/\/developer.nvidia.com\/blog\/nvidia-turing-architecture-in-depth"},{"key":"e_1_2_1_16_1","first-page":"25","article-title":"Fast Radius Search Exploiting Ray Tracing Frameworks","volume":"10","author":"Evangelou I.","year":"2021","unstructured":"I. Evangelou, G. Papaioannou, K. Vardis, and A. A. Vasilakis. 2021. Fast Radius Search Exploiting Ray Tracing Frameworks. Journal of Computer Graphics Techniques 10, 1 (2021), 25--48.","journal-title":"Journal of Computer Graphics Techniques"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/2274576.2274578"},{"key":"e_1_2_1_18_1","volume-title":"Fast and Adaptive List Intersections on the GPU. In 2018 IEEE High Performance Extreme Computing Conference. 1--7.","author":"Fox James","unstructured":"James Fox, Oded Green, Kasimir Gabert, Xiaojing An, and David A. Bader. 2018. Fast and Adaptive List Intersections on the GPU. In 2018 IEEE High Performance Extreme Computing Conference. 1--7."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/2621934.2621936"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/3650200.3656610"},{"key":"e_1_2_1_21_1","volume-title":"Logarithmic Radix Binning and Vectorized Triangle Counting. In 2018 IEEE High Performance Extreme Computing Conference. 1--7.","author":"Green Oded","unstructured":"Oded Green, James Fox, Alex Watkins, Alok Tripathy, Kasimir Gabert, Euna Kim, Xiaojing An, Kumar Aatish, and David A. Bader. 2018. Logarithmic Radix Binning and Vectorized Triangle Counting. In 2018 IEEE High Performance Extreme Computing Conference. 1--7."},{"key":"e_1_2_1_22_1","volume-title":"Proceedings of the 26th ACM International Conference on Supercomputing. 331--340","author":"Green Oded","unstructured":"Oded Green, Robert McColl, and David A. Bader. 2012. GPU Merge Path: A GPU Merging Algorithm. In Proceedings of the 26th ACM International Conference on Supercomputing. 331--340."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.5555\/2688283.2688284"},{"key":"e_1_2_1_24_1","volume-title":"Graph Theory Approach to Transportation Systems Design and Optimization. TransNav, the International Journal on Marine Navigation and Safety of Sea Transportation 8, 4","author":"Guze Sambor","year":"2014","unstructured":"Sambor Guze. 2014. Graph Theory Approach to Transportation Systems Design and Optimization. TransNav, the International Journal on Marine Navigation and Safety of Sea Transportation 8, 4 (2014), 571--578."},{"key":"e_1_2_1_25_1","volume-title":"Generalizing Ray Tracing Accelerators for Tree Traversals on GPUs. In 2024 57th Annual IEEE\/ACM International Symposium on Microarchitecture.","author":"Ha Dongho","unstructured":"Dongho Ha, Lufei Liu, Yuan Hsi Chou, Seokjin Go, Won Woo Ro, Hung-Wei Tseng, and Tor M. Aamodt. 2024. Generalizing Ray Tracing Accelerators for Tree Traversals on GPUs. In 2024 57th Annual IEEE\/ACM International Symposium on Microarchitecture."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/3183713.3196924"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.14778\/3625054.3625063"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevE.65.026107"},{"key":"e_1_2_1_29_1","volume-title":"Triangle Counting on GPU Using Fine-Grained Task Distribution. In 2019 IEEE 35th International Conference on Data Engineering Workshops. 225--232","author":"Hu Lin","year":"2019","unstructured":"Lin Hu, Naiqing Guan, and Lei Zou. 2019. Triangle Counting on GPU Using Fine-Grained Task Distribution. In 2019 IEEE 35th International Conference on Data Engineering Workshops. 225--232."},{"key":"e_1_2_1_30_1","volume-title":"2017 IEEE High Performance Extreme Computing Conference. 1--7.","author":"Hu Yang","unstructured":"Yang Hu, Pradeep Kumar, Guy Swope, and H. Howie Huang. 2017. TriX: Triangle Counting at Extreme Scale. In 2017 IEEE High Performance Extreme Computing Conference. 1--7."},{"key":"e_1_2_1_31_1","volume-title":"Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 171--182","author":"Hu Yang","unstructured":"Yang Hu, Hang Liu, and H. Howie Huang. 2018. TriCore: Parallel Triangle Counting on GPUs. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 171--182."},{"key":"e_1_2_1_32_1","volume-title":"Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing. 239--252","author":"Khorasani Farzad","unstructured":"Farzad Khorasani, Keval Vora, Rajiv Gupta, and Laxmi N. Bhuyan. 2014. CuSha: Vertex-Centric Graph Processing on GPUs. In Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing. 239--252."},{"key":"e_1_2_1_33_1","unstructured":"Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. http:\/\/snap.stanford.edu\/data."},{"key":"e_1_2_1_34_1","volume-title":"Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1--12","author":"Liu Hang","unstructured":"Hang Liu and H. Howie Huang. 2015. Enterprise: Breadth-First Graph Traversal on GPUs. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1--12."},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/3620665.3640360"},{"key":"e_1_2_1_36_1","volume-title":"Proceedings of the VLDB Endowment. 1460--1472","author":"Lv Yangming","unstructured":"Yangming Lv, Kai Zhang, Ziming Wang, Xiaodong Zhang, Rubao Lee, Zhenying He, Yinan Jing, and X. Sean Wang. 2024. RTScan: Efficient Scan with Ray Tracing Cores. In Proceedings of the VLDB Endowment. 1460--1472."},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/3650200.3656601"},{"key":"e_1_2_1_38_1","volume-title":"Quezada","author":"Meneses Enzo","year":"2024","unstructured":"Enzo Meneses, Crist\u00f3bal A. Navarro, H\u00e9ctor Ferrada, and Felipe A. Quezada. 2024. Accelerating Range Minimum Queries With Ray Tracing Cores. Future Generation Computer Systems 157, C (2024), 98--111."},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/2145816.2145832"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/2717511"},{"key":"e_1_2_1_41_1","volume-title":"Retrieved","author":"Morley Keith","year":"2019","unstructured":"Keith Morley. 2019. How to Get Started with OptiX 7. Retrieved Oct 16, 2024 from https:\/\/developer.nvidia.com\/blog\/how-to-get-started-with-optix-7\/"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS54959.2023.00100"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/3577193.3593738"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/2517349.2522739"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/3173162.3173180"},{"key":"e_1_2_1_46_1","volume-title":"H-INDEX: Hash-Indexing for Parallel Triangle Counting on GPUs. In 2019 IEEE High Performance Extreme Computing Conference. 1--7.","author":"Pandey Santosh","year":"2019","unstructured":"Santosh Pandey, Xiaoye Sherry Li, Aydin Buluc, Jiejun Xu, and Hang Liu. 2019. H-INDEX: Hash-Indexing for Parallel Triangle Counting on GPUs. In 2019 IEEE High Performance Extreme Computing Conference. 1--7."},{"key":"e_1_2_1_47_1","volume-title":"Using Graph Theory to Analyze Biological Networks. BioData Mining 4, 10","author":"Pavlopoulos Georgios A","year":"2011","unstructured":"Georgios A Pavlopoulos, Maria Secrier, Charalampos N Moschopoulos, Theodoros G Soldatos, Sophia Kossida, Jan Aerts, Reinhard Schneider, and Pantelis G Bagos. 2011. Using Graph Theory to Analyze Biological Networks. BioData Mining 4, 10 (2011)."},{"key":"e_1_2_1_48_1","volume-title":"Counting Triangles in Large Graphs on GPU. In 2016 IEEE International Parallel and Distributed Processing Symposium Workshops. 740--746","author":"Polak Adam","year":"2016","unstructured":"Adam Polak. 2016. Counting Triangles in Large Graphs on GPU. In 2016 IEEE International Parallel and Distributed Processing Symposium Workshops. 740--746."},{"key":"e_1_2_1_49_1","volume-title":"Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence. 4292--4293","author":"Ryan","unstructured":"Ryan A. Rossi and Nesreen K. Ahmed. 2015. The Network Data Repository with Interactive Graph Analytics and Visualization. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence. 4292--4293."},{"key":"e_1_2_1_50_1","doi-asserted-by":"crossref","unstructured":"Justin Salmon and Simon McIntosh-Smith. 2019. Exploiting Hardware-Accelerated Ray Tracing for Monte Carlo Particle Transport with OpenMC. In 2019 IEEE\/ACM Performance Modeling Benchmarking and Simulation of High Performance Computer Systems. 19--29.","DOI":"10.1109\/PMBS49563.2019.00008"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3457279"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2017.2745562"},{"key":"e_1_2_1_53_1","volume-title":"Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 135--146","author":"Shun Julian","unstructured":"Julian Shun and Guy E. Blelloch. 2013. Ligra: A Lightweight Graph Processing Framework for Shared Memory. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 135--146."},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1021\/c160016a007"},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00371-018-1532-8"},{"key":"e_1_2_1_56_1","volume-title":"Proceedings of the Conference on High-Performance Graphics. 7--13","author":"Wald Ingo","year":"2022","unstructured":"Ingo Wald, Will Usher, Nate Morrical, Laura Lediaev, and Valerio Pascucci. 2022. RTX Beyond Ray Tracing: Exploring the Use of Hardware Ray Tracing Cores for Tet-Mesh Point Location. In Proceedings of the Conference on High-Performance Graphics. 7--13."},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1145\/3293883.3295733"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1145\/3108140"},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1145\/3466795"},{"key":"e_1_2_1_60_1","volume-title":"Proceedings of the 24th International Conference on Parallel and Distributed Computing. 672--687","author":"Yang Carl","unstructured":"Carl Yang, Aydin Bulu\u00e7, and John D. Owens. 2018. Design Principles for Sparse Matrix Multiplication on the GPU. In Proceedings of the 24th International Conference on Parallel and Distributed Computing. 672--687."},{"key":"e_1_2_1_61_1","volume-title":"Accelerating Force-Directed Graph Drawing with RT Cores. In 2020 IEEE Visualization Conference (VIS). 96--100","author":"Zellmann Stefan","year":"2020","unstructured":"Stefan Zellmann, Martin Weier, and Ingo Wald. 2020. Accelerating Force-Directed Graph Drawing with RT Cores. In 2020 IEEE Visualization Conference (VIS). 96--100."},{"key":"e_1_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.14778\/1920841.1920887"},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2013.111"},{"key":"e_1_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1145\/3503221.3508409"}],"container-title":["Proceedings of the ACM on Measurement and Analysis of Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3727108","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3727108","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,22]],"date-time":"2025-08-22T21:31:02Z","timestamp":1755898262000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3727108"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,5,27]]},"references-count":64,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2025,5,27]]}},"alternative-id":["10.1145\/3727108"],"URL":"https:\/\/doi.org\/10.1145\/3727108","relation":{},"ISSN":["2476-1249"],"issn-type":[{"value":"2476-1249","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,5,27]]},"assertion":[{"value":"2025-06-03","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}