{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,5]],"date-time":"2026-05-05T04:16:47Z","timestamp":1777954607645,"version":"3.51.4"},"publisher-location":"New York, NY, USA","reference-count":36,"publisher":"ACM","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2025,7,16]]},"DOI":"10.1145\/3694906.3743331","type":"proceedings-article","created":{"date-parts":[[2025,7,16]],"date-time":"2025-07-16T16:19:56Z","timestamp":1752682796000},"page":"193-209","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["ReFINE: A Reactive and Fine-Grained Scheduling Framework For Concurrency on General Purpose GPUs"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0009-0002-1382-8360","authenticated-orcid":false,"given":"Guin","family":"Gilman","sequence":"first","affiliation":[{"name":"Worcester Polytechnic Institute, Worcester, MA, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1338-6403","authenticated-orcid":false,"given":"Robert J.","family":"Walls","sequence":"additional","affiliation":[{"name":"Worcester Polytechnic Institute, Worcester, MA, USA"}]}],"member":"320","published-online":{"date-parts":[[2025,7,16]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"2025. NVIDIA's Multi-Instance GPU User Guide."},{"key":"e_1_3_2_1_2_1","volume-title":"Mosaic: A GPU Memory Manager with Application-Transparent Support for Multiple Page Sizes. In 2017 50th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO).","author":"Ausavarungnirun R.","unstructured":"R. Ausavarungnirun, J. Landgraf, V. Miller, S. Ghose, J. Gandhi, C.J. Rossbach, and O. Mutlu. 2017. Mosaic: A GPU Memory Manager with Application-Transparent Support for Multiple Page Sizes. In 2017 50th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO)."},{"key":"e_1_3_2_1_3_1","volume-title":"MASK: Redesigning the GPU Memory Hierarchy to Support Multi-Application Concurrency. 53, 2","author":"Ausavarungnirun Rachata","year":"2018","unstructured":"Rachata Ausavarungnirun, Vance Miller, Joshua Landgraf, Saugata Ghose, Jayneel Gandhi, Adwait Jog, Christopher J. Rossbach, and Onur Mutlu. 2018. MASK: Redesigning the GPU Memory Hierarchy to Support Multi-Application Concurrency. 53, 2 (2018)."},{"key":"e_1_3_2_1_4_1","volume-title":"Phase Aware Warp Scheduling: Mitigating Effects of Phase Behavior in GPGPU Applications. In 2015 International Conference on Parallel Architecture and Compilation (PACT). 1--12","author":"Awatramani Mihir","year":"2015","unstructured":"Mihir Awatramani, Xian Zhu, Joseph Zambreno, and Diane Rover. 2015. Phase Aware Warp Scheduling: Mitigating Effects of Phase Behavior in GPGPU Applications. In 2015 International Conference on Parallel Architecture and Compilation (PACT). 1--12."},{"key":"e_1_3_2_1_5_1","volume-title":"PipeSwitch: Fast Pipelined Context Switching for Deep Learning Applications. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20)","author":"Bai Zhihao","year":"2020","unstructured":"Zhihao Bai, Zhen Zhang, Yibo Zhu, and Xin Jin. 2020. PipeSwitch: Fast Pipelined Context Switching for Deep Learning Applications. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). USENIX Association, 499--514."},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/2925426.2926271"},{"key":"e_1_3_2_1_7_1","volume-title":"Deadline-Based Scheduling for GPU with Preemption Support. In 2018 IEEE Real-Time Systems Symposium (RTSS). 119--130","author":"Capodieci N.","unstructured":"N. Capodieci, R. Cavicchioli, M. Bertogna, and A. Paramakuru. 2018. Deadline-Based Scheduling for GPU with Preemption Support. In 2018 IEEE Real-Time Systems Symposium (RTSS). 119--130."},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/IISWC.2009.5306797"},{"key":"e_1_3_2_1_9_1","volume-title":"Serving Heterogeneous Machine Learning Models on Multi-GPU Servers with Spatio-Temporal Sharing. In 2022 USENIX Annual Technical Conference (USENIX ATC 22)","author":"Choi Seungbeom","year":"2022","unstructured":"Seungbeom Choi, Sunho Lee, Yeonjae Kim, Jongse Park, Youngjin Kwon, and Jaehyuk Huh. 2022. Serving Heterogeneous Machine Learning Models on Multi-GPU Servers with Spatio-Temporal Sharing. In 2022 USENIX Annual Technical Conference (USENIX ATC 22). USENIX Association, Carlsbad, CA, 199--216. https:\/\/www.usenix.org\/conference\/atc22\/presentation\/choi-seungbeom"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2018.00027"},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/3419111.3421284"},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/L-CA.2013.9"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/3307650.3322224"},{"key":"e_1_3_2_1_14_1","volume-title":"38th International Symposium on Computer Performance, Modeling, Measurements and Evaluation","author":"Gilman Guin R.","year":"2020","unstructured":"Guin R. Gilman, Samuel S. Ogden, Tian Guo, and Robert J. Walls. 2020. Demystifying the placement policies of the GPU thread block scheduler for concurrent kernels. In 38th International Symposium on Computer Performance, Modeling, Measurements and Evaluation 2020."},{"key":"e_1_3_2_1_15_1","volume-title":"Proceedings of the 6th USENIX Symposium on Operating Systems Design and Implementation. 539--558","author":"Han Mingcong","year":"2022","unstructured":"Mingcong Han, Hanze Zhang, Rong Chen, and Haibo Chen. 2022. Microsecond-scale Preemption for Concurrent GPU-accelerated DNN Inferences. In Proceedings of the 6th USENIX Symposium on Operating Systems Design and Implementation. 539--558."},{"key":"e_1_3_2_1_16_1","volume-title":"Accel-Sim: An Extensible Simulation Framework for Validated GPU Modeling. In 2020 ACM\/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). 473--486","author":"Khairy Mahmoud","unstructured":"Mahmoud Khairy, Zhesheng Shen, Tor M. Aamodt, and Timothy G. Rogers. 2020. Accel-Sim: An Extensible Simulation Framework for Validated GPU Modeling. In 2020 ACM\/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). 473--486."},{"key":"e_1_3_2_1_17_1","volume-title":"Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning. arXiv:2012.02732","author":"Kwon Woosuk","year":"2020","unstructured":"Woosuk Kwon, Gyeong-In Yu, Eunji Jeong, and Byung-Gon Chun. 2020. Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning. arXiv:2012.02732"},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/3542929.3563510"},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"crossref","unstructured":"Zhen Lin Lars Nyland and Huiyang Zhou. 2016. Enabling Efficient Preemption for SIMT Architectures with Lightweight Context Switching. In SC '16: Proceedings of the International Conference for High Performance Computing Networking Storage and Analysis. 898--908.","DOI":"10.1109\/SC.2016.76"},{"key":"e_1_3_2_1_20_1","unstructured":"Sharan Narang and Greg Diamos. 2017. DeepBench. https:\/\/github.com\/baidu-research\/DeepBench"},{"key":"e_1_3_2_1_21_1","volume-title":"Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems","author":"Pai Sreepathi","unstructured":"Sreepathi Pai, Matthew J. Thazhuthaveetil, and R. Govindarajan. 2013. Improving GPGPU Concurrency with Elastic Kernels. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems (Houston, Texas, USA) (ASPLOS '13)."},{"key":"e_1_3_2_1_22_1","first-page":"4","article-title":"Chimera: Collaborative Preemption for Multitasking on a Shared GPU","volume":"50","author":"Kyu Park Jason Jong","year":"2015","unstructured":"Jason Jong Kyu Park, Yongjun Park, and Scott Mahlke. 2015. Chimera: Collaborative Preemption for Multitasking on a Shared GPU. SIGPLAN Not. 50, 4 (April 2015).","journal-title":"SIGPLAN Not."},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/2807591.2807598"},{"key":"e_1_3_2_1_24_1","volume-title":"Cache-Conscious Wavefront Scheduling. In 2012 45th Annual IEEE\/ACM International Symposium on Microarchitecture. 72--83","author":"Rogers Timothy G.","unstructured":"Timothy G. Rogers, Mike O'Connor, and Tor M. Aamodt. 2012. Cache-Conscious Wavefront Scheduling. In 2012 45th Annual IEEE\/ACM International Symposium on Microarchitecture. 72--83."},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/3341301.3359658"},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.23919\/DATE54114.2022.9774761"},{"key":"e_1_3_2_1_27_1","volume-title":"Tullsen","author":"Snavely Allan","year":"2000","unstructured":"Allan Snavely and Dean M. Tullsen. 2000. Symbiotic job-scheduling for a simultaneous multithreaded processor. Operating Systems Review (2000)."},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/3627703.3629578"},{"key":"e_1_3_2_1_29_1","volume-title":"Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing. https:\/\/api.semanticscholar.org\/CorpusID:497928","author":"Stratton John A.","year":"2012","unstructured":"John A. Stratton, Christopher I. Rodrigues, I-Jui Sung, Nady Obeid, Li-Wen Chang, Nasser Anssari, Geng Liu, and Wen mei W. Hwu. 2012. Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing. https:\/\/api.semanticscholar.org\/CorpusID:497928"},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/3037697.3037742"},{"key":"e_1_3_2_1_31_1","volume-title":"AntMan: Dynamic Scaling on GPU Clusters for Deep Learning. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20)","author":"Xiao Wencong","year":"2020","unstructured":"Wencong Xiao, Shiru Ren, Yong Li, Yang Zhang, Pengyang Hou, Zhi Li, Yihui Feng, Wei Lin, and Yangqing Jia. 2020. AntMan: Dynamic Scaling on GPU Clusters for Deep Learning. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). USENIX Association, 533--548."},{"key":"e_1_3_2_1_32_1","volume-title":"2016 ACM\/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).","author":"Xu Q.","unstructured":"Q. Xu, H. Jeon, K. Kim, W. W. Ro, and M. Annavaram. 2016. Warped-Slicer: Efficient Intra-SM Slicing through Dynamic Resource Partitioning for GPU Multiprogramming. In 2016 ACM\/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA)."},{"key":"e_1_3_2_1_33_1","volume-title":"Characterization and Prediction of Performance Interference on Mediated Passthrough GPUs for Interference-aware Scheduler. In 11th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 19)","author":"Xu Xin","year":"2019","unstructured":"Xin Xu, Na Zhang, Michael Cui, Michael He, and Ridhi Surana. 2019. Characterization and Prediction of Performance Interference on Mediated Passthrough GPUs for Interference-aware Scheduler. In 11th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 19). USENIX Association, Renton, WA. https:\/\/www.usenix.org\/conference\/hotcloud19\/presentation\/xu-xin"},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/3372224.3419192"},{"key":"e_1_3_2_1_35_1","volume-title":"Laius: Towards latency awareness and improved utilization of spatial multitasking accelerators in datacenters. 58--68.","author":"Zhang Wei","year":"2019","unstructured":"Wei Zhang, Weihao Cui, Kaihua Fu, Quan Chen, Daniel Mawhirter, Bo Wu, Chao Li, and Minyi Guo. 2019. Laius: Towards latency awareness and improved utilization of spatial multitasking accelerators in datacenters. 58--68."},{"key":"e_1_3_2_1_36_1","volume-title":"2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).","author":"Zheng T.","unstructured":"T. Zheng, D. Nellans, A. Zulfiqar, M. Stephenson, and S. W. Keckler. 2016. Towards high performance paged memory for GPUs. In 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA)."}],"event":{"name":"SPAA '25: 37th ACM Symposium on Parallelism in Algorithms and Architectures","location":"Portland OR USA","acronym":"SPAA '25","sponsor":["SIGACT ACM Special Interest Group on Algorithms and Computation Theory","SIGARCH ACM Special Interest Group on Computer Architecture","EATCS European Association for Theoretical Computer Science"]},"container-title":["Proceedings of the 37th ACM Symposium on Parallelism in Algorithms and Architectures"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3694906.3743331","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,5,4]],"date-time":"2026-05-04T19:18:48Z","timestamp":1777922328000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3694906.3743331"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7,16]]},"references-count":36,"alternative-id":["10.1145\/3694906.3743331","10.1145\/3694906"],"URL":"https:\/\/doi.org\/10.1145\/3694906.3743331","relation":{},"subject":[],"published":{"date-parts":[[2025,7,16]]},"assertion":[{"value":"2025-07-16","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}