{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:09:27Z","timestamp":1750219767247,"version":"3.41.0"},"reference-count":61,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2023,7,19]],"date-time":"2023-07-19T00:00:00Z","timestamp":1689724800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"NSF","award":["CCF-2010830"],"award-info":[{"award-number":["CCF-2010830"]}]},{"DOI":"10.13039\/100006602","name":"AFRL","doi-asserted-by":"crossref","award":["FA9550-18-1-0166"],"award-info":[{"award-number":["FA9550-18-1-0166"]}],"id":[{"id":"10.13039\/100006602","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2023,9,30]]},"abstract":"<jats:p>\n            This article introduces turn-based spatiotemporal coherence. Spatiotemporal coherence is a novel coherence implementation that assigns write permission to epochs (or turns) as opposed to a processor core. This paradigm shift in the assignment of write permissions satisfies all conditions of a coherence protocol with virtually no coherence overhead. We discuss the implementation of this coherence mechanism on a baseline GPU. The evaluation shows that spatiotemporal coherence achieves a speedup of 7.13% for workloads with read data reuse across kernels compared to the baseline software-managed GPU coherence implementation while also providing write atomicity and avoiding the need for software inserted acquire-release operations.\n            <jats:xref ref-type=\"fn\">\n              <jats:sup>1<\/jats:sup>\n            <\/jats:xref>\n          <\/jats:p>","DOI":"10.1145\/3593054","type":"journal-article","created":{"date-parts":[[2023,5,10]],"date-time":"2023-05-10T12:13:31Z","timestamp":1683720811000},"page":"1-27","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Turn-based Spatiotemporal Coherence for GPUs"],"prefix":"10.1145","volume":"20","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0285-5742","authenticated-orcid":false,"given":"Sooraj","family":"Puthoor","sequence":"first","affiliation":[{"name":"University of Wisconsin-Madison, AMD Research"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8535-9244","authenticated-orcid":false,"given":"Mikko H.","family":"Lipasti","sequence":"additional","affiliation":[{"name":"University of Wisconsin-Madison"}]}],"member":"320","published-online":{"date-parts":[[2023,7,19]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.1109\/2.546611"},{"key":"e_1_3_2_3_2","doi-asserted-by":"crossref","unstructured":"Jade Alglave Daniel Kroening Vincent Nimal and Michael Tautschnig. 2012. Software verification for weak memory via program transformation. Retrieved from http:\/\/arxiv.org\/abs\/1207.7264.","DOI":"10.1007\/978-3-642-37036-6_28"},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.5555\/2032305.2032311"},{"key":"e_1_3_2_5_2","first-page":"341","volume-title":"Proceedings of the 45th Annual IEEE\/ACM International Symposium on Microarchitecture","author":"Alisafaee M.","year":"2012","unstructured":"M. Alisafaee. 2012. Spatiotemporal coherence tracking. In Proceedings of the 45th Annual IEEE\/ACM International Symposium on Microarchitecture. 341\u2013350."},{"key":"e_1_3_2_6_2","volume-title":"Proceedings of the 49th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201916)","author":"Alsop Johnathan","year":"2016","unstructured":"Johnathan Alsop, Marc S. Orr, Bradford M. Beckmann, and David A. Wood. 2016. Lazy release consistency for GPUs. In Proceedings of the 49th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201916). IEEE Press, Piscataway, NJ, Article 26, 13 pages. Retrieved from http:\/\/dl.acm.org\/citation.cfm?id=3195638.3195669."},{"key":"e_1_3_2_7_2","article-title":"AMD Graphics Cores Next (GCN) Architecture","year":"2012","unstructured":"AMD. 2012. AMD Graphics Cores Next (GCN) Architecture. Retrieved fromhttps:\/\/goo.gl\/GPvy8R.","journal-title":"Retrieved from"},{"key":"e_1_3_2_8_2","unstructured":"AMD. 2016. AMD GCN3 ISA Architecture Manual. Retrieved from https:\/\/gpuopen.com\/compute-product\/amd-gcn3-isa-architecture-manual."},{"key":"e_1_3_2_9_2","unstructured":"AMD. 2016. HCC Example Apps. Retrieved from https:\/\/github.com\/ROCm-Developer-Tools\/HCC-Example-Application."},{"key":"e_1_3_2_10_2","unstructured":"AMD. 2019. Compute Apps. Retrieved from https:\/\/github.com\/AMDComputeLibraries\/ComputeApps."},{"key":"e_1_3_2_11_2","unstructured":"AMD. 2019. User Guide for AMDGPU Backend. Retrieved from https:\/\/llvm.org\/docs\/AMDGPUUsage.html."},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.1145\/1706299.1706303"},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1145\/2989081.2989092"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1145\/3178487.3178492"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2011.281"},{"key":"e_1_3_2_16_2","volume-title":"Proceedings of the IEEE Hot Chips 26 Symposium (HCS\u201914)","author":"Bouvier D.","year":"2014","unstructured":"D. Bouvier and B. Sander. 2014. Applying AMD\u2019s kaveri APU for heterogeneous computing. In Proceedings of the IEEE Hot Chips 26 Symposium (HCS\u201914)."},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1145\/1250734.1250737"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-70545-1_12"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1145\/2001420.2001436"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/IISWC.2013.6704684"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/IISWC.2009.5306797"},{"key":"e_1_3_2_22_2","first-page":"155","volume-title":"Proceedings of the International Conference on Parallel Architectures and Compilation Techniques","author":"Choi B.","year":"2011","unstructured":"B. Choi, R. Komuravelli, H. Sung, R. Smolinski, N. Honarmand, S. V. Adve, V. S. Adve, N. P. Carter, and C. Chou. 2011. DeNovo: Rethinking the memory hierarchy for disciplined parallelism. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques. 155\u2013166."},{"key":"e_1_3_2_23_2","unstructured":"HSA Foundation. 2016. HSA platform system architecture specification 1.1. Retrieved from http:\/\/www.hsafoundation.com\/?ddownload=5114."},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2018.00058"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1145\/1555754.1555779"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2014.6835930"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/2.707614"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1145\/2541940.2541981"},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.1145\/3410463.3414623"},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2014.6835938"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1145\/2716282.2716291"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/mm.2016.24"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1145\/3243176.3243179"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1145\/2485922.2485964"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.1145\/1105734.1105747"},{"key":"e_1_3_2_36_2","unstructured":"NVIDIA. 2009. Nvidia Tesla V100 GPU Architecture. Retrieved from https:\/\/images.nvidia.com\/content\/volta-architecture\/pdf\/volta-architecture-whitepaper.pdf."},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.1145\/2678373.2665701"},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.1145\/2540708.2540747"},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1145\/2884045.2884052"},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1145\/3243176.3243203"},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.1145\/3428153"},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.1145\/3180270.3180271"},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2017.40"},{"key":"e_1_3_2_44_2","first-page":"582","volume-title":"Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA\u201920)","author":"Ren X.","year":"2020","unstructured":"X. Ren, D. Lustig, E. Bolotin, A. Jaleel, O. Villa, and D. Nellans. 2020. HMG: Extending cache coherence protocols across modern hierarchical multi-GPU systems. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA\u201920). 582\u2013595."},{"key":"e_1_3_2_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2016.7783736"},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2018.00028"},{"key":"e_1_3_2_47_2","unstructured":"Keun Sup Shim Myong Hyon Cho Mieszko Lis Omer Khan and Srinivas Devadas. 2011. Library cache coherence. http:\/\/hdl.handle.net\/1721.1\/62580."},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1145\/2830772.2830821"},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.1145\/3079856.3080206"},{"key":"e_1_3_2_50_2","doi-asserted-by":"publisher","DOI":"10.1145\/2830772.2830778"},{"key":"e_1_3_2_51_2","doi-asserted-by":"publisher","DOI":"10.5555\/2337159.2337220"},{"key":"e_1_3_2_52_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2013.6522351"},{"key":"e_1_3_2_53_2","doi-asserted-by":"publisher","DOI":"10.5555\/2028905"},{"key":"e_1_3_2_54_2","first-page":"403","volume-title":"Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA\u201918)","author":"Tabbakh Abdulaziz","year":"2018","unstructured":"Abdulaziz Tabbakh, Xuehai Qian, and Murali Annavaram. 2018. G-TSC: Timestamp based coherence for GPUs. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA\u201918). IEEE, 403\u2013415."},{"key":"e_1_3_2_55_2","doi-asserted-by":"publisher","DOI":"10.1145\/2716282.2716283"},{"key":"e_1_3_2_56_2","first-page":"132","volume-title":"Proceedings of the IEEE 17th International Symposium on High Performance Computer Architecture","author":"Vantrease D.","year":"2011","unstructured":"D. Vantrease, M. H. Lipasti, and N. Binkert. 2011. Atomic coherence: Leveraging nanophotonics to build race-free cache coherence protocols. In Proceedings of the IEEE 17th International Symposium on High Performance Computer Architecture. 132\u2013143."},{"key":"e_1_3_2_57_2","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2015.56"},{"key":"e_1_3_2_58_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2018.00035"},{"key":"e_1_3_2_59_2","doi-asserted-by":"publisher","DOI":"10.1109\/PACT.2015.12"},{"key":"e_1_3_2_60_2","first-page":"261","volume-title":"Proceedings of the International Conference on Parallel Architecture and Compilation Techniques (PACT\u201916)","author":"Yu Xiangyao","year":"2016","unstructured":"Xiangyao Yu, Hongzhe Liu, Ethan Zou, and Srinivas Devadas. 2016. Tardis 2.0: Optimized time traveling coherence for relaxed consistency models. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques (PACT\u201916). IEEE, 261\u2013274."},{"key":"e_1_3_2_61_2","unstructured":"Sizhuo Zhang Arvind and Muralidaran Vijayaraghavan. 2016. Taming weak memory models. Retrieved from http:\/\/arxiv.org\/abs\/1606.05416."},{"key":"e_1_3_2_62_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2018.00021"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3593054","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3593054","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:37:19Z","timestamp":1750178239000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3593054"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,7,19]]},"references-count":61,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2023,9,30]]}},"alternative-id":["10.1145\/3593054"],"URL":"https:\/\/doi.org\/10.1145\/3593054","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2023,7,19]]},"assertion":[{"value":"2022-08-04","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-03-12","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-07-19","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}