{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,8]],"date-time":"2025-06-08T04:00:55Z","timestamp":1749355255727,"version":"3.41.0"},"reference-count":24,"publisher":"Institute of Electronics, Information and Communications Engineers (IEICE)","issue":"6","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["IEICE Trans. Inf. &amp; Syst."],"published-print":{"date-parts":[[2025,6,1]]},"DOI":"10.1587\/transinf.2024edp7203","type":"journal-article","created":{"date-parts":[[2024,12,15]],"date-time":"2024-12-15T22:12:29Z","timestamp":1734300749000},"page":"558-569","source":"Crossref","is-referenced-by-count":0,"title":["Enhancing GPU Performance Through Complexity-Effective Out-of-Order Execution Using Distance-Based ISA"],"prefix":"10.1587","volume":"E108.D","author":[{"given":"Reoma","family":"MATSUO","sequence":"first","affiliation":[{"name":"Graduate School of Information Science and Technology, The University of Tokyo"}]},{"given":"Toru","family":"KOIZUMI","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, Nagoya Institute of Technology"}]},{"given":"Hidetsugu","family":"IRIE","sequence":"additional","affiliation":[{"name":"Graduate School of Information Science and Technology, The University of Tokyo"}]},{"given":"Shuichi","family":"SAKAI","sequence":"additional","affiliation":[{"name":"Graduate School of Information Science and Technology, The University of Tokyo"}]},{"given":"Ryota","family":"SHIOYA","sequence":"additional","affiliation":[{"name":"Graduate School of Information Science and Technology, The University of Tokyo"}]}],"member":"532","reference":[{"key":"1","doi-asserted-by":"publisher","unstructured":"[1] R. Matsuo, T. Koizumi, H. Irie, S. Sakai, and R. Shioya, \u201cTURBULENCE: complexity-effective out-of-order execution on gpu with distance-based ISA,\u201d IEEE Comput. Architecture Letters, pp.175-178, 2023. 10.1109\/lca.2023.3289317","DOI":"10.1109\/LCA.2023.3289317"},{"key":"2","unstructured":"[2] NVIDIA, \u201cNVIDIA TURING GPU ARCHITECTURE,\u201d https:\/\/images.nvidia.com\/aem-dam\/en-zz\/Solutions\/design-visualization\/technologies\/turing-architecture\/NVIDIA-Turing-Architecture-Whitepaper.pdf, accessed May 15. 2024."},{"key":"3","unstructured":"[3] AMD, \u201cRDNA2 Instruction Set Architecture Manual,\u201dhttps:\/\/developer.amd.com\/wp-content\/resources\/RDNA2_Shader_ISA_November2020.pdf, accessed May 15. 2024."},{"key":"4","doi-asserted-by":"crossref","unstructured":"[4] S.-Y. Lee and C.-J. Wu, \u201cCharacterizing the latency hiding ability of GPUs,\u201d Proc. IEEE Int. Symp. Performance Analysis of Systems and Software, Monterey, CA, USA, pp.145-146, March 2014. 10.1109\/ispass.2014.6844477","DOI":"10.1109\/ISPASS.2014.6844477"},{"key":"5","doi-asserted-by":"crossref","unstructured":"[5] S.-Y. Lee, A. Arunkumar, and C.-J. Wu, \u201cCAWA: Coordinated warp scheduling and Cache Prioritization for critical warp acceleration of GPGPU workloads,\u201d Proc. ACM\/IEEE 42nd Annu. Int. Symp. Comput. Architecture, Portland, OR, USA, pp.515-527, June 2015. 10.1145\/2749469.2750418","DOI":"10.1145\/2749469.2750418"},{"key":"6","doi-asserted-by":"crossref","unstructured":"[6] Y. Arafa, A.-H.A. Badawy, G. Chennupati, N. Santhi, and S. Eidenbenz, \u201cLow Overhead Instruction Latency Characterization for NVIDIA GPGPUs,\u201d Proc. IEEE High Performance Extreme Computing Conference, Waltham, MA, USA, pp.1-8, Sept. 2019. 10.1109\/hpec.2019.8916466","DOI":"10.1109\/HPEC.2019.8916466"},{"key":"7","doi-asserted-by":"crossref","unstructured":"[7] A. Sembrant, T. Carlson, E. Hagersten, D. Black-Shaffer, A. Perais, A. Seznec, and P. Michaud, \u201cLong term parking (LTP): Criticality-aware resource allocation in OOO processors,\u201d Proc. ACM\/IEEE 48th Annu. Int. Symp. Microarchitecture, Waikiki, HI, USA, pp.334-346, Dec. 2015. 10.1145\/2830772.2830815","DOI":"10.1145\/2830772.2830815"},{"key":"8","doi-asserted-by":"crossref","unstructured":"[8] R. Shioya and H. Ando, \u201cEnergy efficiency improvement of renamed trace cache through the reduction of dependent path length,\u201d Proc. IEEE 32nd Int. Conf. Comput. Design, Seoul, Korea (South), pp.416-423, Oct. 2014. 10.1109\/iccd.2014.6974714","DOI":"10.1109\/ICCD.2014.6974714"},{"key":"9","doi-asserted-by":"crossref","unstructured":"[9] H. Irie, T. Koizumi, A. Fukuda, S. Akaki, S. Nakae, Y. Bessho, R. Shioya, T. Notsu, K. Yoda, T. Ishihara, and S. Sakai, \u201cSTRAIGHT: Hazardless processor architecture without register renaming,\u201d Proc. ACM\/IEEE 51st Annu. Int. Symp. Microarchitecture, Fukuoka, Japan, pp.121-133, Oct. 2018. 10.1109\/micro.2018.00019","DOI":"10.1109\/MICRO.2018.00019"},{"key":"10","doi-asserted-by":"crossref","unstructured":"[10] T. Koizumi, S. Sugita, R. Shioya, J. Kadomoto, H. Irie, and S. Sakai, \u201cCompiling and optimizing real-world programs for STRAIGHT ISA,\u201d Proc. IEEE 39th Int. Conf. Comput. Design, Storrs, CT, USA, pp.400-408, Oct. 2021. 10.1109\/ICCD53106.2021.00070","DOI":"10.1109\/ICCD53106.2021.00070"},{"key":"11","doi-asserted-by":"crossref","unstructured":"[11] M. Khairy, Z. Shen, T.M. Aamodt, and T.G. Rogers, \u201cAccel-Sim: An Extensible Simulation Framework for Validated GPU Modeling,\u201d Proc. ACM\/IEEE 47th Annu. Int. Symp. Comput. Architecture, Valencia, Spain, pp.473-486, May 2020. 10.1109\/isca45697.2020.00047","DOI":"10.1109\/ISCA45697.2020.00047"},{"key":"12","doi-asserted-by":"publisher","unstructured":"[12] S. Palacharla, N.P. Jouppi, and J.E. Smith, \u201cComplexity-effective superscalar processors,\u201d Proc. ACM\/IEEE 47th Annu. Int. Symp. Comput. Architecture, Denver, Colorado, USA, vol.25, no.2, pp.206-218, June 1997. 10.1145\/384286.264201","DOI":"10.1145\/384286.264201"},{"key":"13","doi-asserted-by":"crossref","unstructured":"[13] M. Goshima, K. Nishino, Y. Nakashima, S. Mori, T. Kitamura, and S. Tomita, \u201cA high-speed dynamic instruction scheduling scheme for superscalar processors,\u201d Proc. ACM\/IEEE 34th Annu. Int. Symp. Microarchitecture, Austin, TX, USA, pp.225-236, Dec. 2001. 10.1109\/micro.2001.991121","DOI":"10.1109\/MICRO.2001.991121"},{"key":"14","unstructured":"[14] B.-W. Coon, P.-C. Mills, S.-F. Oberman, and M.-Y. Siu, \u201cScoreboard having size indicators for tracking sequential destination register usage in a multi-threaded processor,\u201d U.S. Patent 8225076B1, USA, 2012."},{"key":"15","doi-asserted-by":"crossref","unstructured":"[15] K. Iliakis, S. Xydis, and D. Soudris, \u201cRepurposing GPU Microarchitectures with Light-Weight Out-Of-Order Execution,\u201d IEEE Trans. Parallel and Distributed Systems, vol.33, no.2, pp.388-402, 2022. 10.1109\/TPDS.2021.3093231","DOI":"10.1109\/TPDS.2021.3093231"},{"key":"16","doi-asserted-by":"crossref","unstructured":"[16] S. Che, M. Boyer, J. Meng, D. Tarjan, J.W. Sheaffer, S.H. Lee, and K. Skadron, \u201cRodinia: A benchmark suite for heterogeneous computing,\u201d Proc. IEEE Int. Symp. Workload Characterization, Austin, TX, USA, pp.44-54, Oct. 2009. 10.1109\/iiswc.2009.5306797","DOI":"10.1109\/IISWC.2009.5306797"},{"key":"17","doi-asserted-by":"crossref","unstructured":"[17] O. Villa, M. Stephenson, D. Nellans, and S.W. Keckler, \u201cNVBit: A Dynamic Binary Instrumentation Framework for NVIDIA GPUs,\u201d Proc. ACM\/IEEE 52nd Annu. Int. Symp. Microarchitecture, pp.372-383, Oct. 2019. 10.1145\/3352460.3358307","DOI":"10.1145\/3352460.3358307"},{"key":"18","doi-asserted-by":"crossref","unstructured":"[18] D. Yan, W. Wang, and X. Chu, \u201cOptimizing batched winograd convolution on gpus,\u201d Proc. 25th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, p.32-44, Feb. 2020. 10.1145\/3332466.3374520","DOI":"10.1145\/3332466.3374520"},{"key":"19","unstructured":"[19] NVIDIA, \u201cNVIDIA CUDA Compiler Driver NVCC,\u201d https:\/\/docs.nvidia.com\/cuda\/cuda-compiler-driver-nvcc\/index.html, accessed May 15. 2024."},{"key":"20","doi-asserted-by":"crossref","unstructured":"[20] T.G. Rogers, M. O\u2019Connor, and T.M. Aamodt, \u201cCache-Conscious Wavefront Scheduling,\u201d Proc. ACM\/IEEE 45th Annu. Int. Symp. Microarchitecture, Vancouver, BC, Canada, pp.72-83, Dec. 2012. 10.1109\/MICRO.2012.16","DOI":"10.1109\/MICRO.2012.16"},{"key":"21","unstructured":"[21] NVIDIA, \u201cCUDA samples,\u201d https:\/\/github.com\/nvidia\/cuda-samples, accessed May 15. 2024."},{"key":"22","doi-asserted-by":"publisher","unstructured":"[22] J. Leng, T. Hetherington, A. ElTantawy, S. Gilani, N.S. Kim, T.M. Aamodt, and V.J. Reddi, \u201cGPUWattch: enabling energy optimizations in GPGPUs,\u201d Proc. ACM\/IEEE 40th Annu. Int. Symp. Comput. Architecture, pp.487-498, June 2013. 10.1145\/2508148.2485964","DOI":"10.1145\/2508148.2485964"},{"key":"23","doi-asserted-by":"crossref","unstructured":"[23] K. Kim, S. Lee, M.K. Yoon, G. Koo, W.W. Ro, and M. Annavaram, \u201cWarped-preexecution: A GPU pre-execution approach for improving latency hiding,\u201d Proc. IEEE Int. Symp. High Performance Comput. Architecture, Barcelona, Spain, pp.163-175, March 2016. 10.1109\/HPCA.2016.7446062","DOI":"10.1109\/HPCA.2016.7446062"},{"key":"24","doi-asserted-by":"publisher","unstructured":"[24] X. Gong, X. Gong, L. Yu, and D. Kaeli, \u201cHAWS: Accelerating GPU wavefront execution through selective out-of-order execution,\u201d ACM Trans. on Architecture and Code Optimization, vol.16, no.2, pp.1-22, 2019. 10.1145\/3291050","DOI":"10.1145\/3291050"}],"container-title":["IEICE Transactions on Information and Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.jstage.jst.go.jp\/article\/transinf\/E108.D\/6\/E108.D_2024EDP7203\/_pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,7]],"date-time":"2025-06-07T03:42:33Z","timestamp":1749267753000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.jstage.jst.go.jp\/article\/transinf\/E108.D\/6\/E108.D_2024EDP7203\/_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,1]]},"references-count":24,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2025]]}},"URL":"https:\/\/doi.org\/10.1587\/transinf.2024edp7203","relation":{},"ISSN":["0916-8532","1745-1361"],"issn-type":[{"type":"print","value":"0916-8532"},{"type":"electronic","value":"1745-1361"}],"subject":[],"published":{"date-parts":[[2025,6,1]]},"article-number":"2024EDP7203"}}