{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,17]],"date-time":"2026-01-17T12:02:32Z","timestamp":1768651352185,"version":"3.49.0"},"reference-count":73,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2022,2,24]],"date-time":"2022-02-24T00:00:00Z","timestamp":1645660800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Meas. Anal. Comput. Syst."],"published-print":{"date-parts":[[2022,2,24]]},"abstract":"<jats:p>Multi-application execution in Graphics Processing Units (GPUs), a promising way to utilize GPU resources, is still challenging. Some pieces of prior work (e.g., spatial multitasking) have limited opportunity to improve resource utilization, while other works, e.g., simultaneous multi-kernel, provide fine-grained resource sharing at the price of unfair execution. This paper proposes a new multi-application paradigm for GPUs, called NURA, that provides high potential to improve resource utilization and ensures fairness and Quality-of-Service (QoS). The key idea is that each streaming multiprocessor (SM) executes Cooperative Thread Arrays (CTAs) belong to only one application (similar to the spatial multi-tasking) and shares its unused resources with the SMs running other applications demanding more resources. NURA handles resource sharing process mainly using a software approach to provide simplicity, low hardware cost, and flexibility. We also perform some hardware modifications as an architectural support for our software-based proposal. We conservatively analyze the hardware cost of our proposal, and observe less than 1.07% area overhead with respect to the whole GPU die. Our experimental results over various mixes of GPU workloads show that NURA improves GPU system throughput by 26% compared to state-of-the-art spatial multi-tasking, on average, while meeting the QoS target. In terms of fairness, NURA has almost similar results to spatial multitasking, while it outperforms simultaneous multi-kernel by an average of 76%.<\/jats:p>","DOI":"10.1145\/3508036","type":"journal-article","created":{"date-parts":[[2022,2,28]],"date-time":"2022-02-28T23:44:29Z","timestamp":1646091869000},"page":"1-27","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":15,"title":["NURA"],"prefix":"10.1145","volume":"6","author":[{"given":"Sina","family":"Darabi","sequence":"first","affiliation":[{"name":"Sharif University of Technology, Tehran, Iran"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Negin","family":"Mahani","sequence":"additional","affiliation":[{"name":"Sharif University of Technology, Tehran, Iran"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hazhir","family":"Baxishi","sequence":"additional","affiliation":[{"name":"Sharif University of Technology, Tehran, Iran"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ehsan","family":"Yousefzadeh-Asl-Miandoab","sequence":"additional","affiliation":[{"name":"Sharif University of Technology, Tehran, Iran"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mohammad","family":"Sadrosadati","sequence":"additional","affiliation":[{"name":"Institute for Research in Fundamental Sciences (IPM), Tehran, Iran"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hamid","family":"Sarbazi-Azad","sequence":"additional","affiliation":[{"name":"Sharif University of Technology &amp; Institute for Research in Fundamental Sciences (IPM), Tehran, Iran"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2022,2,28]]},"reference":[{"key":"e_1_2_1_2_1","volume-title":"Hardware Support for Prescient Instruction Prefetch. In 10th International Symposium on High Performance Computer Architecture (HPCA'04)","author":"Aamodt T. M.","year":"2004","unstructured":"T. M. Aamodt , P. Chow , P. Hammarlund , Hong Wang , and J. P. Shen . 2004 . Hardware Support for Prescient Instruction Prefetch. In 10th International Symposium on High Performance Computer Architecture (HPCA'04) . 84--84. https: \/\/doi.org\/10.1109\/HPCA. 2004 .10028 10.1109\/HPCA.2004.10028 T. M. Aamodt, P. Chow, P. Hammarlund, Hong Wang, and J. P. Shen. 2004. Hardware Support for Prescient Instruction Prefetch. In 10th International Symposium on High Performance Computer Architecture (HPCA'04). 84--84. https: \/\/doi.org\/10.1109\/HPCA.2004.10028"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2012.6168946"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/ASPDAC.2014.6742976"},{"key":"e_1_2_1_5_1","doi-asserted-by":"crossref","unstructured":"Akhil Arunkumar Evgeny Bolotin David Nellans and Carole-Jean Wu. 2019. Understanding the Future of Energy Efficiency in Multi-Module GPUs. In to appear) Proceedings of the IEEE International Symposium on High Performance Computer Architecture.  Akhil Arunkumar Evgeny Bolotin David Nellans and Carole-Jean Wu. 2019. Understanding the Future of Energy Efficiency in Multi-Module GPUs. In to appear) Proceedings of the IEEE International Symposium on High Performance Computer Architecture.","DOI":"10.1109\/HPCA.2019.00063"},{"key":"e_1_2_1_6_1","volume-title":"Studying Execution Time and Memory Transfer Time of Image Processing Using GPU Cards. In 2021 IEEE 11th Annual Computing and Communication Workshop and Conference (CCWC). IEEE, 0689--0695","author":"Asaduzzaman Abu","year":"2021","unstructured":"Abu Asaduzzaman , Srinivas Jojigiri , Thushar Sabu , and Sanath Tailam . 2021 . Studying Execution Time and Memory Transfer Time of Image Processing Using GPU Cards. In 2021 IEEE 11th Annual Computing and Communication Workshop and Conference (CCWC). IEEE, 0689--0695 . Abu Asaduzzaman, Srinivas Jojigiri, Thushar Sabu, and Sanath Tailam. 2021. Studying Execution Time and Memory Transfer Time of Image Processing Using GPU Cards. In 2021 IEEE 11th Annual Computing and Communication Workshop and Conference (CCWC). IEEE, 0689--0695."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2009.4919648"},{"key":"e_1_2_1_8_1","first-page":"21","article-title":"Enabling GPGPU low-level hardware explorations with MIAOW: an open-source RTL implementation of a GPGPU","volume":"12","author":"Balasubramanian Raghuraman","year":"2015","unstructured":"Raghuraman Balasubramanian , Vinay Gangadhar , Ziliang Guo , Chen-Han Ho , Cherin Joseph , Jaikrishnan Menon , Mario Paulo Drumond , Robin Paul , Sharath Prasad , Pradip Valathol , 2015 . Enabling GPGPU low-level hardware explorations with MIAOW: an open-source RTL implementation of a GPGPU . ACM Transactions on Architecture and Code Optimization (TACO) 12 , 2 (2015), 21 . Raghuraman Balasubramanian, Vinay Gangadhar, Ziliang Guo, Chen-Han Ho, Cherin Joseph, Jaikrishnan Menon, Mario Paulo Drumond, Robin Paul, Sharath Prasad, Pradip Valathol, et al. 2015. Enabling GPGPU low-level hardware explorations with MIAOW: an open-source RTL implementation of a GPGPU. ACM Transactions on Architecture and Code Optimization (TACO) 12, 2 (2015), 21.","journal-title":"ACM Transactions on Architecture and Code Optimization (TACO)"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/2764908"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/ECRTS.2012.15"},{"key":"e_1_2_1_11_1","volume-title":"GPU performance analysis and optimisation","author":"Bradley Thomas","year":"2012","unstructured":"Thomas Bradley . 2012. GPU performance analysis and optimisation . NVIDIA Corporation ( 2012 ). Thomas Bradley. 2012. GPU performance analysis and optimisation. NVIDIA Corporation (2012)."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/IISWC.2009.5306797"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/3018743.3018748"},{"key":"e_1_2_1_14_1","volume-title":"Buddy Compression: Enabling Larger Memory for Deep Learning and HPC Workloads on GPUs. In 2020 ACM\/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). 926--939","author":"Choukse E.","year":"2020","unstructured":"E. Choukse , M. B. Sullivan , M. O'Connor , M. Erez , J. Pool , D. Nellans , and S. W. Keckler . 2020 . Buddy Compression: Enabling Larger Memory for Deep Learning and HPC Workloads on GPUs. In 2020 ACM\/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). 926--939 . https:\/\/doi.org\/10.1109\/ISCA45697. 2020 .00080 10.1109\/ISCA45697.2020.00080 E. Choukse, M. B. Sullivan, M. O'Connor, M. Erez, J. Pool, D. Nellans, and S. W. Keckler. 2020. Buddy Compression: Enabling Larger Memory for Deep Learning and HPC Workloads on GPUs. In 2020 ACM\/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). 926--939. https:\/\/doi.org\/10.1109\/ISCA45697.2020.00080"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/3392717.3392771"},{"key":"e_1_2_1_16_1","volume-title":"Proceedings. 34th ACM\/IEEE International Symposium on Microarchitecture. MICRO-34. 306--317. https:\/\/doi.org\/10.1109\/MICRO. 2001.991128 Proc. ACM Meas. Anal. Comput. Syst.","volume":"6","author":"Collins J. D.","year":"2022","unstructured":"J. D. Collins , D. M. Tullsen , Hong Wang , and J. P. Shen . 2001. Dynamic speculative precomputation . In Proceedings. 34th ACM\/IEEE International Symposium on Microarchitecture. MICRO-34. 306--317. https:\/\/doi.org\/10.1109\/MICRO. 2001.991128 Proc. ACM Meas. Anal. Comput. Syst. , Vol. 6 , No. 1, Article 16. Publication date : March 2022 . 16:24 Sina Darabi, et al. 10.1109\/MICRO J. D. Collins, D. M. Tullsen, Hong Wang, and J. P. Shen. 2001. Dynamic speculative precomputation. In Proceedings. 34th ACM\/IEEE International Symposium on Microarchitecture. MICRO-34. 306--317. https:\/\/doi.org\/10.1109\/MICRO. 2001.991128 Proc. ACM Meas. Anal. Comput. Syst., Vol. 6, No. 1, Article 16. Publication date: March 2022. 16:24 Sina Darabi, et al."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2018.00027"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2018.2866246"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/3001589"},{"key":"e_1_2_1_20_1","volume-title":"Themis: Predicting and Reining in Application-level Slowdown on Spatial Multitasking GPUs. In 2019 IEEE International Parallel Distributed Processing Symposium (IPDPS).","author":"W. Zhao","year":"2019","unstructured":"W. Zhao et al. 2019 . Themis: Predicting and Reining in Application-level Slowdown on Spatial Multitasking GPUs. In 2019 IEEE International Parallel Distributed Processing Symposium (IPDPS). W. Zhao et al. 2019. Themis: Predicting and Reining in Application-level Slowdown on Spatial Multitasking GPUs. In 2019 IEEE International Parallel Distributed Processing Symposium (IPDPS)."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/3368089.3417050"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2012.18"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/L-CA.2009.4"},{"key":"e_1_2_1_24_1","volume-title":"CLIJ: GPU-accelerated image processing for everyone. Nature methods 17, 1","author":"Haase Robert","year":"2020","unstructured":"Robert Haase , Loic A Royer , Peter Steinbach , Deborah Schmidt , Alexandr Dibrov , Uwe Schmidt , MartinWeigert, Nicola Maghelli , Pavel Tomancak , Florian Jug , 2020 . CLIJ: GPU-accelerated image processing for everyone. Nature methods 17, 1 (2020), 5--6. Robert Haase, Loic A Royer, Peter Steinbach, Deborah Schmidt, Alexandr Dibrov, Uwe Schmidt, MartinWeigert, Nicola Maghelli, Pavel Tomancak, Florian Jug, et al. 2020. CLIJ: GPU-accelerated image processing for everyone. Nature methods 17, 1 (2020), 5--6."},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/2830772.2830784"},{"key":"e_1_2_1_26_1","article-title":"Exploiting Bank Conflict-Based Side-Channel Timing Leakage of GPUs","volume":"16","author":"Jiang Zhen Hang","year":"2019","unstructured":"Zhen Hang Jiang , Yunsi Fei , and David Kaeli . 2019 . Exploiting Bank Conflict-Based Side-Channel Timing Leakage of GPUs . ACM Trans. Archit. Code Optim. 16 , 4, Article 42 (Nov. 2019), 24 pages. https:\/\/doi.org\/10.1145\/3361870 10.1145\/3361870 Zhen Hang Jiang, Yunsi Fei, and David Kaeli. 2019. Exploiting Bank Conflict-Based Side-Channel Timing Leakage of GPUs. ACM Trans. Archit. Code Optim. 16, 4, Article 42 (Nov. 2019), 24 pages. https:\/\/doi.org\/10.1145\/3361870","journal-title":"ACM Trans. Archit. Code Optim."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2016.7783717"},{"key":"e_1_2_1_28_1","first-page":"104","volume-title":"38th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO'05)","author":"Lu Jiwei","unstructured":"Jiwei Lu , A. Das , Wei-Chung Hsu , Khoa Nguyen , and S. G. Abraham . 2005. Dynamic helper threaded prefetching on the Sun UltraSPARC\/spl reg\/ CMP processor . In 38th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO'05) . 12 pp.-- 104 . https:\/\/doi.org\/10.1109\/MICRO.2005.18 10.1109\/MICRO.2005.18 Jiwei Lu, A. Das, Wei-Chung Hsu, Khoa Nguyen, and S. G. Abraham. 2005. Dynamic helper threaded prefetching on the Sun UltraSPARC\/spl reg\/ CMP processor. In 38th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO'05). 12 pp.--104. https:\/\/doi.org\/10.1109\/MICRO.2005.18"},{"key":"e_1_2_1_29_1","volume-title":"Proceedings of the 2011 USENIX Conference on USENIX Annual Technical Conference","author":"Kato Shinpei","year":"2011","unstructured":"Shinpei Kato , Karthik Lakshmanan , Ragunathan Rajkumar , and Yutaka Ishikawa . 2011 . TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments . In Proceedings of the 2011 USENIX Conference on USENIX Annual Technical Conference ( Portland, OR) (USENIXATC'11). USENIX Association, USA, 2. Shinpei Kato, Karthik Lakshmanan, Ragunathan Rajkumar, and Yutaka Ishikawa. 2011. TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments. In Proceedings of the 2011 USENIX Conference on USENIX Annual Technical Conference (Portland, OR) (USENIXATC'11). USENIX Association, USA, 2."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2011.89"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA45697.2020.00047"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2018.00073"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/605397.605415"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/LCA.2018.2889042"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/DAC18072.2020.9218711"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2018.00041"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/2485922.2485964"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1137\/1.9781611976137.10"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCAD.2017.8203754"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2020.2983731"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2020.2983731"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3123939.3124538"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/3243734.3243831"},{"key":"e_1_2_1_44_1","first-page":"1","article-title":"Efficient Nearest-Neighbor Data Sharing in GPUs","volume":"18","author":"Nematollahi Negin","year":"2020","unstructured":"Negin Nematollahi , Mohammad Sadrosadati , Hajar Falahati , Marzieh Barkhordar , Mario Paulo Drumond , Hamid Sarbazi-Azad , and Babak Falsafi . 2020 . Efficient Nearest-Neighbor Data Sharing in GPUs . ACM Transactions on Architecture and Code Optimization (TACO) 18 , 1 (2020), 1 -- 26 . Negin Nematollahi, Mohammad Sadrosadati, Hajar Falahati, Marzieh Barkhordar, Mario Paulo Drumond, Hamid Sarbazi-Azad, and Babak Falsafi. 2020. Efficient Nearest-Neighbor Data Sharing in GPUs. ACM Transactions on Architecture and Code Optimization (TACO) 18, 1 (2020), 1--26.","journal-title":"ACM Transactions on Architecture and Code Optimization (TACO)"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/LCA.2018.2873679"},{"key":"e_1_2_1_46_1","unstructured":"NVIDIA. 2017. Volta architecture Whitepaper - NVIDIA File Downloads. https:\/\/images.nvidia.com\/content\/voltaarchitecture\/ pdf\/volta-architecture-whitepaper.pdf  NVIDIA. 2017. Volta architecture Whitepaper - NVIDIA File Downloads. https:\/\/images.nvidia.com\/content\/voltaarchitecture\/ pdf\/volta-architecture-whitepaper.pdf"},{"key":"e_1_2_1_47_1","volume-title":"P\u00e9ter Vingelmann, and Frank H.P. Fitzek.","author":"Ampere Whitepaper NVIDIA.","year":"2020","unstructured":"NVIDIA. 2021. Ampere Whitepaper . https:\/\/www.nvidia.com\/content\/dam\/en-zz\/Solutions\/Data-Center\/nvidia-amperearchitecture- whitepaper.pdf, [48] NVIDIA , P\u00e9ter Vingelmann, and Frank H.P. Fitzek. 2020 . CUDA , release: 10.2.89. https:\/\/developer.nvidia.com\/cudatoolkit NVIDIA. 2021. Ampere Whitepaper. https:\/\/www.nvidia.com\/content\/dam\/en-zz\/Solutions\/Data-Center\/nvidia-amperearchitecture- whitepaper.pdf, [48] NVIDIA, P\u00e9ter Vingelmann, and Frank H.P. Fitzek. 2020. CUDA, release: 10.2.89. https:\/\/developer.nvidia.com\/cudatoolkit"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/3093315.3037707"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA52012.2021.00019"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/3173162.3173211"},{"key":"e_1_2_1_51_1","volume-title":"Automation Test in Europe Conference Exhibition (DATE), 2017","author":"Sadrosadati M.","year":"2017","unstructured":"M. Sadrosadati , A. Mirhosseini , S. Roozkhosh , H. Bakhishi , and H. Sarbazi-Azad . 2017. Effective cache bank placement for GPUs. In Design , Automation Test in Europe Conference Exhibition (DATE), 2017 . 31--36. https: \/\/doi.org\/10.23919\/DATE. 2017 .7926954 10.23919\/DATE.2017.7926954 M. Sadrosadati, A. Mirhosseini, S. Roozkhosh, H. Bakhishi, and H. Sarbazi-Azad. 2017. Effective cache bank placement for GPUs. In Design, Automation Test in Europe Conference Exhibition (DATE), 2017. 31--36. https: \/\/doi.org\/10.23919\/DATE.2017.7926954"},{"key":"e_1_2_1_52_1","first-page":"769","article-title":"Scientific computing and computer graphics with GPU: application of projective geometry and principle of duality","volume":"15","author":"Skala V","year":"2020","unstructured":"V Skala , SAA Karim , and EA Kadir . 2020 . Scientific computing and computer graphics with GPU: application of projective geometry and principle of duality . Int. J. Math. Comput. Sci 15 , 3 (2020), 769 -- 777 . V Skala, SAA Karim, and EA Kadir. 2020. Scientific computing and computer graphics with GPU: application of projective geometry and principle of duality. Int. J. Math. Comput. Sci 15, 3 (2020), 769--777.","journal-title":"Int. J. Math. Comput. Sci"},{"key":"e_1_2_1_53_1","volume-title":"Geng Daniel Liu, and Wen-mei W Hwu","author":"Stratton John A","year":"2012","unstructured":"John A Stratton , Christopher Rodrigues , I- Jui Sung , Nady Obeid , Li-Wen Chang , Nasser Anssari , Geng Daniel Liu, and Wen-mei W Hwu . 2012 . Parboil : A revised benchmark suite for scientific and commercial throughput computing. Center for Reliable and High-Performance Computing 127 (2012). John A Stratton, Christopher Rodrigues, I-Jui Sung, Nady Obeid, Li-Wen Chang, Nasser Anssari, Geng Daniel Liu, and Wen-mei W Hwu. 2012. Parboil: A revised benchmark suite for scientific and commercial throughput computing. Center for Reliable and High-Performance Computing 127 (2012)."},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1145\/2007116.2007131"},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2016.7783718"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1145\/2749469.2750399"},{"key":"e_1_2_1_57_1","volume-title":"Graviton: Trusted Execution Environments on GPUs. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18)","author":"Volos Stavros","year":"2018","unstructured":"Stavros Volos , Kapil Vaswani , and Rodrigo Bruno . 2018 . Graviton: Trusted Execution Environments on GPUs. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18) . USENIX Association , Carlsbad, CA , 681--696. https:\/\/www.usenix.org\/conference\/osdi18\/presentation\/volos Proc. ACM Meas. Anal. Comput. Syst., Vol. 6, No. 1, Article 16. Publication date: March 2022. 16:26 Sina Darabi, et al. Stavros Volos, Kapil Vaswani, and Rodrigo Bruno. 2018. Graviton: Trusted Execution Environments on GPUs. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). USENIX Association, Carlsbad, CA, 681--696. https:\/\/www.usenix.org\/conference\/osdi18\/presentation\/volos Proc. ACM Meas. Anal. Comput. Syst., Vol. 6, No. 1, Article 16. Publication date: March 2022. 16:26 Sina Darabi, et al."},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1145\/3126546"},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2016.7446078"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1145\/3140659.3080203"},{"key":"e_1_2_1_61_1","volume-title":"Leaky DNN: Stealing Deep-Learning Model Secret with GPU Context-Switching Side-Channel. In 2020 50th Annual IEEE\/IFIP International Conference on Dependable Systems and Networks (DSN). 125--137","author":"Wei J.","year":"2020","unstructured":"J. Wei , Y. Zhang , Z. Zhou , Z. Li , and M. A. Al Faruque . 2020 . Leaky DNN: Stealing Deep-Learning Model Secret with GPU Context-Switching Side-Channel. In 2020 50th Annual IEEE\/IFIP International Conference on Dependable Systems and Networks (DSN). 125--137 . https:\/\/doi.org\/10.1109\/DSN48063. 2020 .00031 10.1109\/DSN48063.2020.00031 J. Wei, Y. Zhang, Z. Zhou, Z. Li, and M. A. Al Faruque. 2020. Leaky DNN: Stealing Deep-Learning Model Secret with GPU Context-Switching Side-Channel. In 2020 50th Annual IEEE\/IFIP International Conference on Dependable Systems and Networks (DSN). 125--137. https:\/\/doi.org\/10.1109\/DSN48063.2020.00031"},{"key":"e_1_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2016.29"},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1145\/3330345.3330389"},{"key":"e_1_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1145\/2370816.2370858"},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2016.59"},{"key":"e_1_2_1_66_1","article-title":"Improving Thread-Level Parallelism in GPUs Through Expanding Register File to Scratchpad Memory","volume":"15","author":"Yu Chao","year":"2018","unstructured":"Chao Yu , Yuebin Bai , Qingxiao Sun , and Hailong Yang . 2018 . Improving Thread-Level Parallelism in GPUs Through Expanding Register File to Scratchpad Memory . ACM Trans. Archit. Code Optim. 15 , 4, Article 48 (Nov. 2018), 24 pages. https:\/\/doi.org\/10.1145\/3280849 10.1145\/3280849 Chao Yu, Yuebin Bai, Qingxiao Sun, and Hailong Yang. 2018. Improving Thread-Level Parallelism in GPUs Through Expanding Register File to Scratchpad Memory. ACM Trans. Archit. Code Optim. 15, 4, Article 48 (Nov. 2018), 24 pages. https:\/\/doi.org\/10.1145\/3280849","journal-title":"ACM Trans. Archit. Code Optim."},{"key":"e_1_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.1145\/3330345.3330351"},{"key":"e_1_2_1_68_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2007.346187"},{"key":"e_1_2_1_69_1","volume-title":"Themis: Predicting and Reining in Application-Level Slowdown on Spatial Multitasking GPUs. In 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 653--663","author":"Zhao W.","year":"2019","unstructured":"W. Zhao , Q. Chen , H. Lin , J. Zhang , J. Leng , C. Li , W. Zheng , L. Li , and M. Guo . 2019 . Themis: Predicting and Reining in Application-Level Slowdown on Spatial Multitasking GPUs. In 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 653--663 . https:\/\/doi.org\/10.1109\/IPDPS. 2019 .00074 10.1109\/IPDPS.2019.00074 W. Zhao, Q. Chen, H. Lin, J. Zhang, J. Leng, C. Li, W. Zheng, L. Li, and M. Guo. 2019. Themis: Predicting and Reining in Application-Level Slowdown on Spatial Multitasking GPUs. In 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 653--663. https:\/\/doi.org\/10.1109\/IPDPS.2019.00074"},{"key":"e_1_2_1_70_1","doi-asserted-by":"publisher","DOI":"10.1145\/3373376.3378457"},{"key":"e_1_2_1_71_1","doi-asserted-by":"publisher","DOI":"10.1145\/3205289.3205311"},{"key":"e_1_2_1_72_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2018.2854764"},{"key":"e_1_2_1_73_1","doi-asserted-by":"publisher","DOI":"10.1145\/3123939.3123978"},{"key":"e_1_2_1_74_1","doi-asserted-by":"publisher","DOI":"10.1109\/RTAS.2015.7108420"}],"container-title":["Proceedings of the ACM on Measurement and Analysis of Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3508036","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3508036","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:12:29Z","timestamp":1750191149000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3508036"}},"subtitle":["A Framework for Supporting Non-Uniform Resource Accesses in GPUs"],"short-title":[],"issued":{"date-parts":[[2022,2,24]]},"references-count":73,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2022,2,24]]}},"alternative-id":["10.1145\/3508036"],"URL":"https:\/\/doi.org\/10.1145\/3508036","relation":{},"ISSN":["2476-1249"],"issn-type":[{"value":"2476-1249","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,2,24]]},"assertion":[{"value":"2022-02-28","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}