{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,27]],"date-time":"2026-02-27T03:47:13Z","timestamp":1772164033489,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":41,"publisher":"ACM","license":[{"start":{"date-parts":[[2013,6,23]],"date-time":"2013-06-23T00:00:00Z","timestamp":1371945600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2013,6,23]]},"DOI":"10.1145\/2485922.2485953","type":"proceedings-article","created":{"date-parts":[[2013,6,25]],"date-time":"2013-06-25T15:13:21Z","timestamp":1372173201000},"page":"356-367","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":33,"title":["Maximizing SIMD resource utilization in GPGPUs with SIMD lane permutation"],"prefix":"10.1145","author":[{"given":"Minsoo","family":"Rhu","sequence":"first","affiliation":[{"name":"The University of Texas at Austin"}]},{"given":"Mattan","family":"Erez","sequence":"additional","affiliation":[{"name":"The University of Texas at Austin"}]}],"member":"320","published-online":{"date-parts":[[2013,6,23]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"GPGPU-Sim. http:\/\/www.gpgpu-sim.org.  GPGPU-Sim. http:\/\/www.gpgpu-sim.org."},{"key":"e_1_3_2_1_2_1","unstructured":"GPGPU-Sim Manual. http:\/\/www.gpgpu-sim.org\/manual.  GPGPU-Sim Manual. http:\/\/www.gpgpu-sim.org\/manual."},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/567067.567085"},{"key":"e_1_3_2_1_4_1","volume-title":"AMD Radeon HD 6900M Series Specifications","author":"AMD Corporation","year":"2010","unstructured":"AMD Corporation . AMD Radeon HD 6900M Series Specifications , 2010 . AMD Corporation. AMD Radeon HD 6900M Series Specifications, 2010."},{"key":"e_1_3_2_1_5_1","volume-title":"August","author":"AMD Corporation","year":"2010","unstructured":"AMD Corporation . ATI Stream Computing OpenCL Programming Guide , August 2010 . AMD Corporation. ATI Stream Computing OpenCL Programming Guide, August 2010."},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/161541.161736"},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2009.4919648"},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/PROC.1972.8647"},{"key":"e_1_3_2_1_9_1","volume-title":"Simultaneous Branch and Warp Interweaving for Sustained GPU Performance. In 39th International Symposium on Computer Architecture (ISCA-39)","author":"Brunie N.","year":"2012","unstructured":"N. Brunie , S. Collange , and G. Diamos . Simultaneous Branch and Warp Interweaving for Sustained GPU Performance. In 39th International Symposium on Computer Architecture (ISCA-39) , June 2012 . N. Brunie, S. Collange, and G. Diamos. Simultaneous Branch and Warp Interweaving for Sustained GPU Performance. In 39th International Symposium on Computer Architecture (ISCA-39), June 2012."},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/IISWC.2009.5306797"},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/2155620.2155676"},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/1854273.1854318"},{"key":"e_1_3_2_1_13_1","volume-title":"Microeconomic Algorithms for Load Balancing in Distributed Computer Systems. In 8th International Conference on Distributed Computing Systems","author":"Ferguson D.","year":"1988","unstructured":"D. Ferguson , Y. Yemini , and C. Nikolaou . Microeconomic Algorithms for Load Balancing in Distributed Computer Systems. In 8th International Conference on Distributed Computing Systems , 1988 . D. Ferguson, Y. Yemini, and C. Nikolaou. Microeconomic Algorithms for Load Balancing in Distributed Computer Systems. In 8th International Conference on Distributed Computing Systems, 1988."},{"key":"e_1_3_2_1_14_1","volume-title":"Thread Block Compaction for Efficient SIMT Control Flow. In 17th International Symposium on High Performance Computer Architecture (HPCA-17)","author":"Fung W. W.","year":"2011","unstructured":"W. W. Fung and T. M. Aamodt . Thread Block Compaction for Efficient SIMT Control Flow. In 17th International Symposium on High Performance Computer Architecture (HPCA-17) , February 2011 . W. W. Fung and T. M. Aamodt. Thread Block Compaction for Efficient SIMT Control Flow. In 17th International Symposium on High Performance Computer Architecture (HPCA-17), February 2011."},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2007.12"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2010.51"},{"key":"e_1_3_2_1_17_1","volume-title":"Jacobi iteration for a Laplace discretisation on a 3D structured grid","author":"Giles M.","year":"2008","unstructured":"M. Giles . Jacobi iteration for a Laplace discretisation on a 3D structured grid , 2008 . M. Giles. Jacobi iteration for a Laplace discretisation on a 3D structured grid, 2008."},{"key":"e_1_3_2_1_18_1","unstructured":"M. Giles and S. Xiaoke. Notes on using the NVIDIA 8800 GTX graphics card. http:\/\/people.maths.ox.ac.uk\/gilesm\/hpc\/ 2008.  M. Giles and S. Xiaoke. Notes on using the NVIDIA 8800 GTX graphics card. http:\/\/people.maths.ox.ac.uk\/gilesm\/hpc\/ 2008."},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.5555\/1782174.1782200"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/360128.360145"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNET.2011.2172953"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.5555\/580550.876450"},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/2370816.2370869"},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/1815961.1815992"},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/2155620.2155656"},{"key":"e_1_3_2_1_26_1","volume-title":"NVIDIA's Next Generation CUDA Compute Architecture: Fermi","author":"NVIDIA Corporation","year":"2009","unstructured":"NVIDIA Corporation . NVIDIA's Next Generation CUDA Compute Architecture: Fermi , 2009 . NVIDIA Corporation. NVIDIA's Next Generation CUDA Compute Architecture: Fermi, 2009."},{"key":"e_1_3_2_1_27_1","unstructured":"NVIDIA Corporation. CUDA C\/C++ SDK CODE Samples 2011.  NVIDIA Corporation. CUDA C\/C++ SDK CODE Samples 2011."},{"key":"e_1_3_2_1_28_1","volume-title":"NVIDIA CUDA Programming Guide","author":"NVIDIA Corporation","year":"2011","unstructured":"NVIDIA Corporation . NVIDIA CUDA Programming Guide , 2011 . NVIDIA Corporation. NVIDIA CUDA Programming Guide, 2011."},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/2304576.2304601"},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/115952.115961"},{"key":"e_1_3_2_1_31_1","volume-title":"CAPRI: Prediction of Compaction-Adequacy for Handling Control-Divergence in GPGPU Architectures. In 39th International Symposium on Computer Architecture (ISCA-39)","author":"Rhu M.","year":"2012","unstructured":"M. Rhu and M. Erez . CAPRI: Prediction of Compaction-Adequacy for Handling Control-Divergence in GPGPU Architectures. In 39th International Symposium on Computer Architecture (ISCA-39) , June 2012 . M. Rhu and M. Erez. CAPRI: Prediction of Compaction-Adequacy for Handling Control-Divergence in GPGPU Architectures. In 39th International Symposium on Computer Architecture (ISCA-39), June 2012."},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2013.6522352"},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/359327.359336"},{"key":"e_1_3_2_1_34_1","volume-title":"High-throughput sequence alignment using graphics processing units. BMC Bioinformatics, 8(1):474","author":"Schatz M.","year":"2007","unstructured":"M. Schatz , C. Trapnell , A. Delcher , and A. Varshney . High-throughput sequence alignment using graphics processing units. BMC Bioinformatics, 8(1):474 , 2007 . M. Schatz, C. Trapnell, A. Delcher, and A. Varshney. High-throughput sequence alignment using graphics processing units. BMC Bioinformatics, 8(1):474, 2007."},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/40.988685"},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/339647.339693"},{"key":"e_1_3_2_1_37_1","volume-title":"Morgan Kaufmann","author":"Muchnick Steven","year":"1997","unstructured":"Steven Muchnick . Advanced Compiler Design and Implementation . Morgan Kaufmann , 1997 . Steven Muchnick. Advanced Compiler Design and Implementation. Morgan Kaufmann, 1997."},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/1654059.1654082"},{"key":"e_1_3_2_1_39_1","volume-title":"Characterization and Transformation of Unstructured Control Flow in GPU Applications. In 1st International Workshop on Characterizing Applications for Heterogeneous Exascale Systems","author":"Wu H.","year":"2011","unstructured":"H. Wu , G. Diamos , S. Li , and S. Yalamanchili . Characterization and Transformation of Unstructured Control Flow in GPU Applications. In 1st International Workshop on Characterizing Applications for Heterogeneous Exascale Systems , June 2011 . H. Wu, G. Diamos, S. Li, and S. Yalamanchili. Characterization and Transformation of Unstructured Control Flow in GPU Applications. In 1st International Workshop on Characterizing Applications for Heterogeneous Exascale Systems, June 2011."},{"key":"e_1_3_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/1810085.1810104"},{"key":"e_1_3_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/360128.360134"}],"event":{"name":"ISCA'13: The 40th Annual International Symposium on Computer Architecture","location":"Tel-Aviv Israel","acronym":"ISCA'13","sponsor":["IEEE CS","SIGARCH ACM Special Interest Group on Computer Architecture"]},"container-title":["Proceedings of the 40th Annual International Symposium on Computer Architecture"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2485922.2485953","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2485922.2485953","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:48:43Z","timestamp":1750222123000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2485922.2485953"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2013,6,23]]},"references-count":41,"alternative-id":["10.1145\/2485922.2485953","10.1145\/2485922"],"URL":"https:\/\/doi.org\/10.1145\/2485922.2485953","relation":{"is-identical-to":[{"id-type":"doi","id":"10.1145\/2508148.2485953","asserted-by":"object"}]},"subject":[],"published":{"date-parts":[[2013,6,23]]},"assertion":[{"value":"2013-06-23","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}