{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,18]],"date-time":"2026-01-18T01:42:20Z","timestamp":1768700540674,"version":"3.49.0"},"reference-count":80,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2021,4,27]],"date-time":"2021-04-27T00:00:00Z","timestamp":1619481600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Graph."],"published-print":{"date-parts":[[2021,4,30]]},"abstract":"<jats:p>This article presents a solution to path tracing of massive scenes on multiple GPUs. Our approach analyzes the memory access pattern of a path tracer and defines how the scene data should be distributed across up to 16 GPUs with minimal effect on performance. The key concept is that the parts of the scene that have the highest amount of memory accesses are replicated on all GPUs.<\/jats:p>\n          <jats:p>We propose two methods for maximizing the performance of path tracing when working with partially distributed scene data. Both methods work on the memory management level and therefore path tracer data structures do not have to be redesigned, making our approach applicable to other path tracers with only minor changes in their code. As a proof of concept, we have enhanced the open-source Blender Cycles path tracer.<\/jats:p>\n          <jats:p>The approach was validated on scenes of sizes up to 169 GB. We show that only 1\u20135% of the scene data needs to be replicated to all machines for such large scenes. On smaller scenes we have verified that the performance is very close to rendering a fully replicated scene. In terms of scalability we have achieved a parallel efficiency of over 94% using up to 16 GPUs.<\/jats:p>","DOI":"10.1145\/3447807","type":"journal-article","created":{"date-parts":[[2021,4,27]],"date-time":"2021-04-27T15:20:51Z","timestamp":1619536851000},"page":"1-17","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":10,"title":["GPU Accelerated Path Tracing of Massive Scenes"],"prefix":"10.1145","volume":"40","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4630-5339","authenticated-orcid":false,"given":"Milan","family":"Jaro\u0161","sequence":"first","affiliation":[{"name":"IT4Innovations, VSB\u2013Technical University of Ostrava, Czech Republic"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Lubom\u00edr","family":"\u0158\u00edha","sequence":"additional","affiliation":[{"name":"IT4Innovations, VSB\u2013Technical University of Ostrava, Czech Republic"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Petr","family":"Strako\u0161","sequence":"additional","affiliation":[{"name":"IT4Innovations, VSB\u2013Technical University of Ostrava, Czech Republic"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mat\u011bj","family":"\u0160pe\u0165ko","sequence":"additional","affiliation":[{"name":"IT4Innovations, VSB\u2013Technical University of Ostrava, Czech Republic"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2021,4,27]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Proceedings of the IEEE 21st International Symposium on High Performance Computer Architecture (HPCA\u201915)","author":"Agarwal Neha","unstructured":"Neha Agarwal , David Nellans , Mike O\u2019Connor , Stephen W. Keckler , and Thomas F. Wenisch . 2015. Unlocking bandwidth for GPUs in CC-NUMA systems . In Proceedings of the IEEE 21st International Symposium on High Performance Computer Architecture (HPCA\u201915) . IEEE, 354\u2013365. Neha Agarwal, David Nellans, Mike O\u2019Connor, Stephen W. Keckler, and Thomas F. Wenisch. 2015. Unlocking bandwidth for GPUs in CC-NUMA systems. In Proceedings of the IEEE 21st International Symposium on High Performance Computer Architecture (HPCA\u201915). IEEE, 354\u2013365."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.5555\/1921479.1921497"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/1572769.1572792"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/2751205.2751210"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/2464996.2465021"},{"key":"e_1_2_1_7_1","unstructured":"AMD. 2017. AMD EPYC SoC Delivers Exceptional Results on the STREAM Benchmark on 2P Servers. Retrieved from https:\/\/www.amd.com\/system\/files\/2017-06\/AMD-EPYC-SoC-Delivers-Exceptional-Results.pdf.  AMD. 2017. AMD EPYC SoC Delivers Exceptional Results on the STREAM Benchmark on 2P Servers. Retrieved from https:\/\/www.amd.com\/system\/files\/2017-06\/AMD-EPYC-SoC-Delivers-Exceptional-Results.pdf."},{"key":"e_1_2_1_8_1","unstructured":"Atos. 2017. BullSequana X410 E5 Dense GPU-Accelerated Compute Node. Retrieved from https:\/\/atos.net\/wp-content\/uploads\/2017\/11\/FS_BullSequana_X410E5_en1-web.pdf.  Atos. 2017. BullSequana X410 E5 Dense GPU-Accelerated Compute Node. Retrieved from https:\/\/atos.net\/wp-content\/uploads\/2017\/11\/FS_BullSequana_X410E5_en1-web.pdf."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA47549.2020.00055"},{"key":"e_1_2_1_10_1","unstructured":"Jeremy Birn. 2015. 3dRender.com: Lighting Challenges. Retrieved from http:\/\/www.3drender.com\/challenges\/.  Jeremy Birn. 2015. 3dRender.com: Lighting Challenges. Retrieved from http:\/\/www.3drender.com\/challenges\/."},{"key":"e_1_2_1_11_1","unstructured":"Blender Foundation. 2018. Cycles Open Source Production Rendering. Retrieved from https:\/\/www.cycles-renderer.org\/.  Blender Foundation. 2018. Cycles Open Source Production Rendering. Retrieved from https:\/\/www.cycles-renderer.org\/."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1111\/j.1467-8659.2009.01378.x"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/3182159"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1111\/j.1467-8659.2008.01253.x"},{"key":"e_1_2_1_15_1","volume-title":"Proceedings\u00a0of the\u00a0International Conference for High Performance Computing, Networking, Storage, and Analysis (SC\u201919)","author":"Chien Steven W. D.","year":"2020","unstructured":"Steven W. D. Chien , Ivy B. Peng , and Stefano Markidis . 2020 . Performance evaluation of advanced features in CUDA unified memory . In Proceedings\u00a0of the\u00a0International Conference for High Performance Computing, Networking, Storage, and Analysis (SC\u201919) . Steven W. D. Chien, Ivy B. Peng, and Stefano Markidis. 2020. Performance evaluation of advanced features in CUDA unified memory. In Proceedings\u00a0of the\u00a0International Conference for High Performance Computing, Networking, Storage, and Analysis (SC\u201919)."},{"key":"e_1_2_1_16_1","volume-title":"Proceedings of the IEEE Symposium on Large Data Analysis and Visualization (LDAV\u201916)","author":"Christensen Cameron","year":"2017","unstructured":"Cameron Christensen , Thomas Fogal , Nathan Luehr , and Cliff Woolley . 2017 . Topology-aware image compositing using NVLink . In Proceedings of the IEEE Symposium on Large Data Analysis and Visualization (LDAV\u201916) . 93\u201394. Cameron Christensen, Thomas Fogal, Nathan Luehr, and Cliff Woolley. 2017. Topology-aware image compositing using NVLink. In Proceedings of the IEEE Symposium on Large Data Analysis and Visualization (LDAV\u201916). 93\u201394."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3182162"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/37401.37414"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2005.02.007"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/3182161"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/3307650.3322224"},{"key":"e_1_2_1_22_1","volume-title":"Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS\u201920)","author":"Ganguly Debashis","year":"2020","unstructured":"Debashis Ganguly , Ziyu Zhang , Jun Yang , and Rami Melhem . 2020 . Adaptive page migration for irregular data-intensive applications under GPU memory oversubscription . In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS\u201920) . IEEE, 451\u2013461. Debashis Ganguly, Ziyu Zhang, Jun Yang, and Rami Melhem. 2020. Adaptive page migration for irregular data-intensive applications under GPU memory oversubscription. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS\u201920). IEEE, 451\u2013461."},{"key":"e_1_2_1_23_1","volume-title":"Proceedings of the IEEE\/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS\u201919)","author":"Gayatri Rahulkumar","year":"2019","unstructured":"Rahulkumar Gayatri , Kevin Gott , and Jack Deslippe . 2019 . Comparing managed memory and ATS with and without prefetching on NVIDIA Volta GPUs . In Proceedings of the IEEE\/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS\u201919) . 41\u201346. DOI:https:\/\/doi.org\/10.1109\/PMBS49563.2019.00010 Rahulkumar Gayatri, Kevin Gott, and Jack Deslippe. 2019. Comparing managed memory and ATS with and without prefetching on NVIDIA Volta GPUs. In Proceedings of the IEEE\/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS\u201919). 41\u201346. DOI:https:\/\/doi.org\/10.1109\/PMBS49563.2019.00010"},{"key":"e_1_2_1_24_1","volume-title":"Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems. 347\u2013358","author":"Gelado Isaac","unstructured":"Isaac Gelado , John E. Stone , Javier Cabezas , Sanjay Patel , Nacho Navarro , and Wen-mei W. Hwu . 2010a. An asymmetric distributed shared memory model for heterogeneous parallel systems . In Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems. 347\u2013358 . Isaac Gelado, John E. Stone, Javier Cabezas, Sanjay Patel, Nacho Navarro, and Wen-mei W. Hwu. 2010a. An asymmetric distributed shared memory model for heterogeneous parallel systems. In Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems. 347\u2013358."},{"key":"e_1_2_1_25_1","volume-title":"Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS\u201910)","author":"Gelado Isaac","unstructured":"Isaac Gelado , John E. Stone , Javier Cabezas , Sanjay Patel , Nacho Navarro , and Wen-mei W. Hwu . 2010b. An asymmetric distributed shared memory model for heterogeneous parallel systems . In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS\u201910) . 347\u2013358. Isaac Gelado, John E. Stone, Javier Cabezas, Sanjay Patel, Nacho Navarro, and Wen-mei W. Hwu. 2010b. An asymmetric distributed shared memory model for heterogeneous parallel systems. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS\u201910). 347\u2013358."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/3182160"},{"key":"e_1_2_1_27_1","unstructured":"Mark Harris. 2017. Unified Memory for CUDA Beginners. Retrieved from https:\/\/devblogs.nvidia.com\/unified-memory-cuda-beginners\/.  Mark Harris. 2017. Unified Memory for CUDA Beginners. Retrieved from https:\/\/devblogs.nvidia.com\/unified-memory-cuda-beginners\/."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/5.747863"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/2370036.2145818"},{"key":"e_1_2_1_30_1","unstructured":"IT4Innovations. 2019. Barbora supercomputer cluster. Retrieved from https:\/\/docs.it4i.cz\/barbora\/introduction\/.  IT4Innovations. 2019. Barbora supercomputer cluster. Retrieved from https:\/\/docs.it4i.cz\/barbora\/introduction\/."},{"key":"e_1_2_1_31_1","volume-title":"Proceedings of the 10th International Symposium on Code Generation and Optimization. 165\u2013174","author":"Jablin Thomas B.","unstructured":"Thomas B. Jablin , James A. Jablin , Prakash Prabhu , Feng Liu , and David I. August . 2012a. Dynamically managed data for CPU-GPU architectures . In Proceedings of the 10th International Symposium on Code Generation and Optimization. 165\u2013174 . Thomas B. Jablin, James A. Jablin, Prakash Prabhu, Feng Liu, and David I. August. 2012a. Dynamically managed data for CPU-GPU architectures. In Proceedings of the 10th International Symposium on Code Generation and Optimization. 165\u2013174."},{"key":"e_1_2_1_32_1","volume-title":"Proceedings of the International Symposium on Code Generation and Optimization (CGO\u201912)","author":"Jablin Thomas B.","unstructured":"Thomas B. Jablin , James A. Jablin , Prakash Prabhu , Feng Liu , and David I. August . 2012b. Dynamically managed data for CPU-GPU architectures . In Proceedings of the International Symposium on Code Generation and Optimization (CGO\u201912) . 165\u2013174. Thomas B. Jablin, James A. Jablin, Prakash Prabhu, Feng Liu, and David I. August. 2012b. Dynamically managed data for CPU-GPU architectures. In Proceedings of the International Symposium on Code Generation and Optimization (CGO\u201912). 165\u2013174."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/3110224.3110236"},{"key":"e_1_2_1_34_1","volume-title":"Technical introduction to OpenEXR. Industrial Light Magic","author":"Kainz Florian","year":"2009","unstructured":"Florian Kainz , Rod Bogart , and Piotr Stanczyk . 2009. Technical introduction to OpenEXR. Industrial Light Magic ( 2009 ), 21. Florian Kainz, Rod Bogart, and Piotr Stanczyk. 2009. Technical introduction to OpenEXR. Industrial Light Magic (2009), 21."},{"key":"e_1_2_1_35_1","volume-title":"Practical Parallel Processing for Today\u2019s Rendering Challenges. SIGGRAPH 2001 Course Note #40","author":"Kato Toshiaki","unstructured":"Toshiaki Kato , Hitoshi Nishimura , Tadashi Endo , Tamotsu Maruyama , Jun Saito , and Per H. Christensen . 2001. Parallel rendering and the quest for realism: The \u201ckilauea\u201d massively parallel ray tracer. In Alan Chalmers , Practical Parallel Processing for Today\u2019s Rendering Challenges. SIGGRAPH 2001 Course Note #40 . IV\u20131 to IV\u201359. Toshiaki Kato, Hitoshi Nishimura, Tadashi Endo, Tamotsu Maruyama, Jun Saito, and Per H. Christensen. 2001. Parallel rendering and the quest for realism: The \u201ckilauea\u201d massively parallel ray tracer. In Alan Chalmers, Practical Parallel Processing for Today\u2019s Rendering Challenges. SIGGRAPH 2001 Course Note #40. IV\u20131 to IV\u201359."},{"key":"e_1_2_1_36_1","unstructured":"M. J. Keates and R. J. Hubbold. 1994. Accelerated Ray Tracing on the KSR1 Virtual Shared-Memory Parallel Computer. Citeseer.  M. J. Keates and R. J. Hubbold. 1994. Accelerated Ray Tracing on the KSR1 Virtual Shared-Memory Parallel Computer. Citeseer."},{"key":"e_1_2_1_37_1","doi-asserted-by":"crossref","unstructured":"Alexander Keller Carsten W\u00e4chter Matthias Raab Daniel Seibert Dietger van Antwerpen Johann Kornd\u00f6rfer and Lutz Kettner. 2017. The Iray Light Transport Simulation and Rendering System. Retrieved from https:\/\/arxiv:cs.GR\/1705.01263.  Alexander Keller Carsten W\u00e4chter Matthias Raab Daniel Seibert Dietger van Antwerpen Johann Kornd\u00f6rfer and Lutz Kettner. 2017. The Iray Light Transport Simulation and Rendering System. Retrieved from https:\/\/arxiv:cs.GR\/1705.01263.","DOI":"10.1145\/3084363.3085050"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/3123939.3123968"},{"key":"e_1_2_1_39_1","volume-title":"Hinton","author":"Krizhevsky Alex","year":"2012","unstructured":"Alex Krizhevsky , Ilya Sutskever , and Geoffrey E . Hinton . 2012 . ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, Vol. 2 . MIT Press , 1097\u20131105. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, Vol. 2. MIT Press, 1097\u20131105."},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/3180495"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2019.2928289"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/IISWC.2018.8573483"},{"key":"e_1_2_1_43_1","unstructured":"Maxon. 2019. Redshift. Retrieved from https:\/\/www.redshift3d.com\/product\/features#all.  Maxon. 2019. Redshift. Retrieved from https:\/\/www.redshift3d.com\/product\/features#all."},{"key":"e_1_2_1_44_1","article-title":"Memory bandwidth and machine balance in current high performance computers. IEEE","author":"McCalpin John D.","year":"1995","unstructured":"John D. McCalpin . 1995 . Memory bandwidth and machine balance in current high performance computers. IEEE Comput. Soc. Tech. Comm. Comput. Arch. Newslett. ( Dec. 1995), 19\u201325. John D. McCalpin. 1995. Memory bandwidth and machine balance in current high performance computers. IEEE Comput. Soc. Tech. Comm. Comput. Arch. Newslett. (Dec. 1995), 19\u201325.","journal-title":"Comput. Soc. Tech. Comm. Comput. Arch. Newslett."},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/38.291528"},{"key":"e_1_2_1_46_1","unstructured":"Patrick Mours. 2019. Accelerating Cycles using NVIDIA RTX. Retrieved from https:\/\/code.blender.org\/2019\/07\/accelerating-cycles-using-nvidia-rtx\/.  Patrick Mours. 2019. Accelerating Cycles using NVIDIA RTX. Retrieved from https:\/\/code.blender.org\/2019\/07\/accelerating-cycles-using-nvidia-rtx\/."},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2013.261"},{"key":"e_1_2_1_48_1","first-page":"061","volume-title":"Eurographics Symposium on Parallel Graphics and Visualization, Hank Childs, Torsten Kuhlen, and Fabio Marton (Eds.). The Eurographics Association. DOI:https:\/\/doi.org\/10","author":"Navr\u00e1til Paul A.","year":"2012","unstructured":"Paul A. Navr\u00e1til , Donald S. Fussell , Calvin Lin , and Hank Childs . 2012 . Dynamic scheduling for large-scale distributed-memory ray tracing . In Eurographics Symposium on Parallel Graphics and Visualization, Hank Childs, Torsten Kuhlen, and Fabio Marton (Eds.). The Eurographics Association. DOI:https:\/\/doi.org\/10 .2312\/EGPGV\/EGPGV12\/ 061 - 070 Paul A. Navr\u00e1til, Donald S. Fussell, Calvin Lin, and Hank Childs. 2012. Dynamic scheduling for large-scale distributed-memory ray tracing. In Eurographics Symposium on Parallel Graphics and Visualization, Hank Childs, Torsten Kuhlen, and Fabio Marton (Eds.). The Eurographics Association. DOI:https:\/\/doi.org\/10.2312\/EGPGV\/EGPGV12\/061-070"},{"key":"e_1_2_1_49_1","unstructured":"NVIDIA. 2017. NVIDIA Tesla V100 GPU Architecture. Retrieved from http:\/\/images.nvidia.com\/content\/volta-architecture\/pdf\/volta-architecture-whitepaper.pdf.   NVIDIA. 2017. NVIDIA Tesla V100 GPU Architecture. Retrieved from http:\/\/images.nvidia.com\/content\/volta-architecture\/pdf\/volta-architecture-whitepaper.pdf."},{"key":"e_1_2_1_50_1","unstructured":"NVIDIA. 2018a. CUDA C Programming Guide. Retrieved from https:\/\/docs.nvidia.com\/cuda\/archive\/10.0\/pdf\/CUDA_C_Programming_Guide.pdf.  NVIDIA. 2018a. CUDA C Programming Guide. Retrieved from https:\/\/docs.nvidia.com\/cuda\/archive\/10.0\/pdf\/CUDA_C_Programming_Guide.pdf."},{"key":"e_1_2_1_51_1","unstructured":"NVIDIA. 2018b. CUDA Runtime API. Retrieved from https:\/\/docs.nvidia.com\/cuda\/archive\/10.0\/pdf\/CUDA_Runtime_API.pdf.  NVIDIA. 2018b. CUDA Runtime API. Retrieved from https:\/\/docs.nvidia.com\/cuda\/archive\/10.0\/pdf\/CUDA_Runtime_API.pdf."},{"key":"e_1_2_1_52_1","unstructured":"NVIDIA. 2018c. NVIDIA NVSWITCH Technical Overview. Retrieved from https:\/\/images.nvidia.com\/content\/pdf\/nvswitch-technical-overview.pdf.  NVIDIA. 2018c. NVIDIA NVSWITCH Technical Overview. Retrieved from https:\/\/images.nvidia.com\/content\/pdf\/nvswitch-technical-overview.pdf."},{"key":"e_1_2_1_53_1","unstructured":"NVIDIA. 2018d. NVIDIA Turing GPU Architecture. Retrieved from https:\/\/www.nvidia.com\/content\/dam\/en-zz\/Solutions\/design-visualization\/technologies\/turing-architecture\/NVIDIA-Turing-Architecture-Whitepaper.pdf.   NVIDIA. 2018d. NVIDIA Turing GPU Architecture. Retrieved from https:\/\/www.nvidia.com\/content\/dam\/en-zz\/Solutions\/design-visualization\/technologies\/turing-architecture\/NVIDIA-Turing-Architecture-Whitepaper.pdf."},{"key":"e_1_2_1_54_1","unstructured":"NVIDIA. 2019. DGX-2\/2H SYSTEM User Guide. Retrieved from https:\/\/docs.nvidia.com\/dgx\/pdf\/dgx2-user-guide.pdf. DU-09130-001_v08.1.  NVIDIA. 2019. DGX-2\/2H SYSTEM User Guide. Retrieved from https:\/\/docs.nvidia.com\/dgx\/pdf\/dgx2-user-guide.pdf. DU-09130-001_v08.1."},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1145\/1778765.1778774"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1145\/300523.300537"},{"key":"e_1_2_1_57_1","volume-title":"Proceedings of the IEEE Visualization Conference. 233\u2013238","author":"Parker Steven","year":"1998","unstructured":"Steven Parker , Peter Shirley , Yarden Livnat , Charles Hansen , and Peter-Pike Sloan . 1998 . Interactive ray tracing for Ssosurface rendering . In Proceedings of the IEEE Visualization Conference. 233\u2013238 . Steven Parker, Peter Shirley, Yarden Livnat, Charles Hansen, and Peter-Pike Sloan. 1998. Interactive ray tracing for Ssosurface rendering. In Proceedings of the IEEE Visualization Conference. 233\u2013238."},{"key":"e_1_2_1_58_1","unstructured":"PIXAR. 2019. OpenSUBDIV. Retrieved from http:\/\/graphics.pixar.com\/opensubdiv\/.  PIXAR. 2019. OpenSUBDIV. Retrieved from http:\/\/graphics.pixar.com\/opensubdiv\/."},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1109\/HICSS.1995.375407"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1109\/88.494605"},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1145\/2464996.2465023"},{"key":"e_1_2_1_62_1","unstructured":"Nikolay Sakharnykh. 2017a. Maximizing Unified Memory Performance in CUDA. Retrieved from https:\/\/devblogs.nvidia.com\/maximizing-unified-memory-performance-cuda\/.  Nikolay Sakharnykh. 2017a. Maximizing Unified Memory Performance in CUDA. Retrieved from https:\/\/devblogs.nvidia.com\/maximizing-unified-memory-performance-cuda\/."},{"key":"e_1_2_1_63_1","volume-title":"Proceedings of the GPU Technology Conference (GTC\u201917)","author":"Sakharnykh Nikolay","year":"2017","unstructured":"Nikolay Sakharnykh . 2017 b. Unified memory on Pascal and Volta . In Proceedings of the GPU Technology Conference (GTC\u201917) . Retrieved from http:\/\/on-demand.gputechconf.com\/gtc\/ 2017\/presentation\/s7285-nikolay-sakharnykh-unified-memory-on-pascal-and-volta.pdf. Nikolay Sakharnykh. 2017b. Unified memory on Pascal and Volta. In Proceedings of the GPU Technology Conference (GTC\u201917). Retrieved from http:\/\/on-demand.gputechconf.com\/gtc\/2017\/presentation\/s7285-nikolay-sakharnykh-unified-memory-on-pascal-and-volta.pdf."},{"key":"e_1_2_1_64_1","volume-title":"Proceedings of the GPU Technology Conference (GTC\u201919)","author":"Sarosh Irani","year":"2019","unstructured":"Irani Sarosh . 2019 . Accelerated computing solutions for AI and HPC workloads. Retrieved from https:\/\/developer.nvidia.com\/gtc\/2019\/video\/S9981. In Proceedings of the GPU Technology Conference (GTC\u201919) . Irani Sarosh. 2019. Accelerated computing solutions for AI and HPC workloads. Retrieved from https:\/\/developer.nvidia.com\/gtc\/2019\/video\/S9981. In Proceedings of the GPU Technology Conference (GTC\u201919)."},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1145\/2688500.2688526"},{"key":"e_1_2_1_66_1","volume-title":"Proceedings of the IEEE International Conference on Big Data (IEEEBigData\u201914)","author":"Shamoto Hideyuki","year":"2015","unstructured":"Hideyuki Shamoto , Koichi Shirahata , Aleksandr Drozd , Hitoshi Sato , and Satoshi Matsuoka . 2015 . Large-scale distributed sorting for GPU-based heterogeneous supercomputers . In Proceedings of the IEEE International Conference on Big Data (IEEEBigData\u201914) . 510\u2013518. Hideyuki Shamoto, Koichi Shirahata, Aleksandr Drozd, Hitoshi Sato, and Satoshi Matsuoka. 2015. Large-scale distributed sorting for GPU-based heterogeneous supercomputers. In Proceedings of the IEEE International Conference on Big Data (IEEEBigData\u201914). 510\u2013518."},{"key":"e_1_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.1109\/2.299410"},{"key":"e_1_2_1_68_1","doi-asserted-by":"publisher","DOI":"10.1145\/3105762.3105784"},{"key":"e_1_2_1_69_1","doi-asserted-by":"publisher","DOI":"10.1145\/279358.279403"},{"key":"e_1_2_1_70_1","unstructured":"Blender Studio. 2020a. Agent 327\u2014Blender Cloud. Retrieved from https:\/\/cloud.blender.org\/films\/agent-327.  Blender Studio. 2020a. Agent 327\u2014Blender Cloud. Retrieved from https:\/\/cloud.blender.org\/films\/agent-327."},{"key":"e_1_2_1_71_1","unstructured":"Blender Studio. 2020b. Spring\u2014Blender Cloud. Retrieved from https:\/\/cloud.blender.org\/films\/spring.  Blender Studio. 2020b. Spring\u2014Blender Cloud. Retrieved from https:\/\/cloud.blender.org\/films\/spring."},{"key":"e_1_2_1_72_1","unstructured":"The Art Institute of Chicago. 2020. Discover Art & Artists. Retrieved from https:\/\/www.artic.edu\/collection.  The Art Institute of Chicago. 2020. Discover Art & Artists. Retrieved from https:\/\/www.artic.edu\/collection."},{"key":"e_1_2_1_73_1","unstructured":"Threedscans. 2020. Three D Scans. Retrieved from https:\/\/threedscans.com.  Threedscans. 2020. Three D Scans. Retrieved from https:\/\/threedscans.com."},{"key":"e_1_2_1_74_1","doi-asserted-by":"publisher","DOI":"10.1111\/cgf.13702"},{"key":"e_1_2_1_75_1","volume-title":"Proceedings of the IEEE Symposium on Parallel and Large-Data Visualization and Graphics (PVG\u201903)","author":"Wald I.","unstructured":"I. Wald , C. Benthin , and P. Slusallek . 2003. Distributed interactive ray tracing of dynamic scenes . In Proceedings of the IEEE Symposium on Parallel and Large-Data Visualization and Graphics (PVG\u201903) . 77\u201385. I. Wald, C. Benthin, and P. Slusallek. 2003. Distributed interactive ray tracing of dynamic scenes. In Proceedings of the IEEE Symposium on Parallel and Large-Data Visualization and Graphics (PVG\u201903). 77\u201385."},{"key":"e_1_2_1_76_1","doi-asserted-by":"publisher","DOI":"10.1145\/2601097.2601199"},{"key":"e_1_2_1_77_1","unstructured":"Walt Disney Animation Studios. 2018. Moana Island Scene. Retrieved from https:\/\/www.disneyanimation.com\/resources\/moana-island-scene\/.  Walt Disney Animation Studios. 2018. Moana Island Scene. Retrieved from https:\/\/www.disneyanimation.com\/resources\/moana-island-scene\/."},{"key":"e_1_2_1_78_1","doi-asserted-by":"publisher","DOI":"10.1145\/2508363.2508413"},{"key":"e_1_2_1_79_1","volume-title":"Proceedings\u00a0of the\u00a0International Symposium on Computer Architecture. 53\u201365","author":"Xie Chenhao","unstructured":"Chenhao Xie , Fu Xin , Mingsong Chen , and Shuaiwen L. Song . 2019. OO-VR: NUMA friendly object-oriented VR rendering framework for future NUMA-based multi-GPU systems . In Proceedings\u00a0of the\u00a0International Symposium on Computer Architecture. 53\u201365 . Chenhao Xie, Fu Xin, Mingsong Chen, and Shuaiwen L. Song. 2019. OO-VR: NUMA friendly object-oriented VR rendering framework for future NUMA-based multi-GPU systems. In Proceedings\u00a0of the\u00a0International Symposium on Computer Architecture. 53\u201365."},{"key":"e_1_2_1_80_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2018.00035"},{"key":"e_1_2_1_81_1","doi-asserted-by":"publisher","DOI":"10.1145\/1618452.1618501"}],"container-title":["ACM Transactions on Graphics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3447807","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3447807","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:48:05Z","timestamp":1750193285000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3447807"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,4,27]]},"references-count":80,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2021,4,30]]}},"alternative-id":["10.1145\/3447807"],"URL":"https:\/\/doi.org\/10.1145\/3447807","relation":{},"ISSN":["0730-0301","1557-7368"],"issn-type":[{"value":"0730-0301","type":"print"},{"value":"1557-7368","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,4,27]]},"assertion":[{"value":"2020-09-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-01-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-04-27","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}