{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,8]],"date-time":"2026-01-08T04:54:45Z","timestamp":1767848085871,"version":"3.49.0"},"reference-count":37,"publisher":"Springer Science and Business Media LLC","issue":"10","license":[{"start":{"date-parts":[[2022,3,7]],"date-time":"2022-03-07T00:00:00Z","timestamp":1646611200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/www.springer.com\/tdm"},{"start":{"date-parts":[[2022,3,7]],"date-time":"2022-03-07T00:00:00Z","timestamp":1646611200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.springer.com\/tdm"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Supercomput"],"published-print":{"date-parts":[[2022,7]]},"DOI":"10.1007\/s11227-022-04359-w","type":"journal-article","created":{"date-parts":[[2022,3,7]],"date-time":"2022-03-07T11:07:26Z","timestamp":1646651246000},"page":"12553-12588","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Vectorizing divergent control flow with active-lane consolidation on long-vector architectures"],"prefix":"10.1007","volume":"78","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-5064-1293","authenticated-orcid":false,"given":"Wyatt","family":"Praharenka","sequence":"first","affiliation":[]},{"given":"David","family":"Pankratz","sequence":"additional","affiliation":[]},{"given":"Jo\u00e3o P. L.","family":"De Carvalho","sequence":"additional","affiliation":[]},{"given":"Ehsan","family":"Amiri","sequence":"additional","affiliation":[]},{"given":"Jos\u00e9 Nelson","family":"Amaral","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,3,7]]},"reference":[{"issue":"1","key":"4359_CR1","doi-asserted-by":"publisher","first-page":"16","DOI":"10.1145\/3433954","volume":"64","author":"D Monroe","year":"2020","unstructured":"Monroe D (2020) Fugaku takes the lead. Commun ACM 64(1):16\u201318","journal-title":"Commun ACM"},{"key":"4359_CR2","doi-asserted-by":"crossref","unstructured":"Allen, JR, Kennedy, K, Porterfield, C, Warren, J (1983) Conversion of control dependence to data dependence. In: Proceedings of the 10th ACM SIGACT-SIGPLAN symposium on principles of programming languages, pp 177\u2013189","DOI":"10.1145\/567067.567085"},{"key":"4359_CR3","doi-asserted-by":"crossref","unstructured":"Barredo A, Cebrian JM, Moret\u00f3 M, Casas M, Valero M (2020) Improving predication efficiency through compaction\/restoration of simd instructions. In: 2020 IEEE international symposium on high performance computer architecture (HPCA), pp 717\u2013728","DOI":"10.1109\/HPCA47549.2020.00064"},{"key":"4359_CR4","doi-asserted-by":"crossref","unstructured":"Jaewook S (2007) Introducing control flow into vectorized code. In: 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007), pp 280\u2013291. IEEE","DOI":"10.1109\/PACT.2007.4336219"},{"issue":"4","key":"4359_CR5","doi-asserted-by":"publisher","first-page":"235","DOI":"10.1016\/j.micpro.2009.02.002","volume":"33","author":"J Shin","year":"2009","unstructured":"Shin J, Hall MW, Chame J (2009) Evaluating compiler technology for control-flow optimizations for multimedia extension architectures. Microprocess Microsyst 33(4):235\u2013243","journal-title":"Microprocess Microsyst"},{"issue":"9","key":"4359_CR6","doi-asserted-by":"publisher","first-page":"948","DOI":"10.1109\/TC.1972.5009071","volume":"C\u201321","author":"MJ Flynn","year":"1972","unstructured":"Flynn MJ (1972) Some computer organizations and their effectiveness. IEEE Trans Comput C\u201321(9):948\u2013960","journal-title":"IEEE Trans Comput"},{"key":"4359_CR7","unstructured":"Intel Corporation (2021) Intel AVX-512. https:\/\/www.intel.com\/content\/www\/us\/en\/architecture-and-technology\/avx-512-overview.html"},{"key":"4359_CR8","unstructured":"ARM Corporation (2021) ARM Advanced SIMD. https:\/\/developer.arm.com\/architectures\/instruction-sets\/simd-isas\/neon"},{"key":"4359_CR9","unstructured":"Arm Limited (2021) Arm\u00aeArchitecture Reference Manual Armv8, for Armv8-A Architecture Profile"},{"issue":"1","key":"4359_CR10","doi-asserted-by":"publisher","first-page":"63","DOI":"10.1145\/359327.359336","volume":"21","author":"RM Russell","year":"1978","unstructured":"Russell RM (1978) The CRAY-1 computer system. Commun ACM 21(1):63\u201372","journal-title":"Commun ACM"},{"key":"4359_CR11","unstructured":"David Patterson (2017) SIMD Instructions Considered Harmful. https:\/\/www.sigarch.org\/simd-instructions-considered-harmful"},{"key":"4359_CR12","unstructured":"Arm Limited (2021) Arm\u00aeArchitecture Reference Manual Supplement The Scalable Vector Extension (SVE), for Armv8-A"},{"key":"4359_CR13","unstructured":"RISC-V\u00ae\u00a0International Members (2021) The RISC-V \u201cV\u201d vector extension. version 0.10 (Visited on April 26, 2021). https:\/\/github.com\/riscv\/riscv-v-spec\/releases\/download\/v0.10\/riscv-v-spec-0.10.pdf"},{"issue":"4","key":"4359_CR14","doi-asserted-by":"publisher","first-page":"363","DOI":"10.1023\/A:1007559022013","volume":"28","author":"N Sreraman","year":"2000","unstructured":"Sreraman N, Govindarajan R (2000) A vectorizing compiler for multimedia extensions. Int J Parallel Prog 28(4):363\u2013400","journal-title":"Int J Parallel Prog"},{"key":"4359_CR15","volume-title":"Optimizing compilers for modern architectures: a dependence-based approach","author":"K Kennedy","year":"2001","unstructured":"Kennedy K, Allen JR (2001) Optimizing compilers for modern architectures: a dependence-based approach. Morgan Kaufmann Publishers Inc., Massachusetts"},{"key":"4359_CR16","volume-title":"High performance compilers for parallel computing","author":"MJ Wolfe","year":"1995","unstructured":"Wolfe MJ (1995) High performance compilers for parallel computing. Addison-Wesley Longman Publishing Co. Inc, New York"},{"issue":"4","key":"4359_CR17","doi-asserted-by":"publisher","first-page":"543","DOI":"10.1145\/3296979.3192413","volume":"53","author":"S Moll","year":"2018","unstructured":"Moll S, Hack S (2018) Partial control-flow linearization. ACM SIGPLAN Notices 53(4):543\u2013556","journal-title":"ACM SIGPLAN Notices"},{"key":"4359_CR18","volume-title":"A catalogue of optimizing transformations","author":"F Allen","year":"1971","unstructured":"Allen F, Cocke J (1971) A catalogue of optimizing transformations. Prentice-Hall, New Jersey"},{"key":"4359_CR19","doi-asserted-by":"publisher","first-page":"133","DOI":"10.1007\/978-3-642-54807-9_8","volume-title":"Compiler construction","author":"J Anantpur","year":"2014","unstructured":"Anantpur J, Govindarajan R (2014) Taming control divergence in gpus through control flow linearization. In: Albert C (ed) Compiler construction. Springer, Berlin Heidelberg, pp 133\u2013153"},{"key":"4359_CR20","doi-asserted-by":"crossref","unstructured":"Sun H, Gorlatch S, Zhao R (2018) Refactoring loops with nested ifs for simd extensions without masked instructions. In: European Conference on Parallel Processing, pp 769\u2013781. Springer","DOI":"10.1007\/978-3-030-10549-5_60"},{"key":"4359_CR21","doi-asserted-by":"crossref","unstructured":"Sun, H, Fey F, Zhao J, Gorlatch S (2019) WCCV: improving the vectorization of IF-statements with warp-coherent conditions. In: Proceedings of the ACM International Conference on Supercomputing, pp 319\u2013329","DOI":"10.1145\/3330345.3331059"},{"key":"4359_CR22","unstructured":"ARM (2020) The arm C language extensions https:\/\/developer.arm.com\/architectures\/system-architectures\/software-standards\/acle"},{"key":"4359_CR23","unstructured":"Fujitsu Limited (2021) A64FX\u00aeMicroarchitecture Manual. Version 1.4"},{"key":"4359_CR24","unstructured":"ARM (2020) The ARM instruction emulator. https:\/\/developer.arm.com\/tools-and-software\/server-and-hpc\/compile\/arm-instruction-emulator"},{"key":"4359_CR25","unstructured":"Bruening D, Amarasinghe S (2004) Efficient, transparent, and comprehensive runtime code manipulation. PhD thesis, Massachusetts Institute of Technology, Department of Electrical Engineering"},{"key":"4359_CR26","unstructured":"SPEC (2021) SPEC2017 Benchmark overview. https:\/\/www.spec.org\/cpu2017\/Docs\/overview.html"},{"key":"4359_CR27","doi-asserted-by":"crossref","unstructured":"Coutinho B, Sampaio D, Pereira FMQ, Meira\u00a0Jr W (2011) Divergence analysis and optimizations. In: 2011 International Conference on Parallel Architectures and Compilation Techniques, pp 320\u2013329. IEEE","DOI":"10.1109\/PACT.2011.63"},{"issue":"2","key":"4359_CR28","doi-asserted-by":"publisher","first-page":"757","DOI":"10.1007\/s00778-019-00547-y","volume":"29","author":"H Lang","year":"2020","unstructured":"Lang H, Passing L, Kipf A, Boncz P, Neumann T, Kemper A (2020) Make the most out of your SIMD investments: counter control flow divergence in compiled query pipelines. VLDB J 29(2):757\u2013774","journal-title":"VLDB J"},{"key":"4359_CR29","doi-asserted-by":"crossref","unstructured":"Fung WWL, Sham I, Yuan G, Aamodt TM (2007) Dynamic warp formation and scheduling for efficient gpu control flow. In: 40th annual IEEE\/ACM international symposium on microarchitecture (MICRO 2007), pp 407\u2013420. IEEE","DOI":"10.1109\/MICRO.2007.30"},{"key":"4359_CR30","doi-asserted-by":"crossref","unstructured":"Fung WWL, Aamodt TM (2011) Thread block compaction for efficient simt control flow. In: 2011 IEEE 17th international symposium on high performance computer architecture, pp 25\u201336. IEEE,","DOI":"10.1109\/HPCA.2011.5749714"},{"key":"4359_CR31","doi-asserted-by":"crossref","unstructured":"Khorasani F, Gupta R, Bhuyan LN (2015) Efficient warp execution in presence of divergence with collaborative context collection. In: Proceedings of the 48th international symposium on microarchitecture, MICRO-48, pp 204-215","DOI":"10.1145\/2830772.2830796"},{"issue":"2","key":"4359_CR32","doi-asserted-by":"publisher","first-page":"26","DOI":"10.1109\/MM.2017.35","volume":"37","author":"N Stephens","year":"2017","unstructured":"Stephens N, Biles S, Boettcher M, Eapen J, Eyole M, Gabrielli G, Horsnell M, Magklis G, Martinez A, Premillieu N et al (2017) The ARM scalable vector extension. IEEE Micro 37(2):26\u201339","journal-title":"IEEE Micro"},{"key":"4359_CR33","doi-asserted-by":"crossref","unstructured":"Sato M, Ishikawa Y, Tomita H, Kodama Y, Odajima T, Tsuji M, Yashiro H, Aoki M, Shida N, Miyoshi I, et al (2020) Co-design for A64FX manycore processor and \u201cFugaku\u201d. In: SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, pp 1\u201315. IEEE","DOI":"10.1109\/SC41405.2020.00051"},{"key":"4359_CR34","unstructured":"Lovett (2021) SVE in LLVM. https:\/\/hps.vi4io.org\/_media\/events\/2020\/llvm-cth20_lovett.pdf"},{"issue":"3","key":"4359_CR35","doi-asserted-by":"publisher","first-page":"2039","DOI":"10.1007\/s11227-019-02842-5","volume":"76","author":"A Armejach","year":"2020","unstructured":"Armejach A, Caminal H, Cebrian JM, Langarita R, Gonz\u00e1lez-Alberquilla R, Adeniyi-Jones C, Valero M, Casas M, Moret\u00f3 M (2020) Using Arm\u00ae scalable vector extension on stencil codes. J Supercomput 76(3):2039\u20132062","journal-title":"J Supercomput"},{"key":"4359_CR36","doi-asserted-by":"publisher","first-page":"759","DOI":"10.1007\/s11554-020-00984-x","volume":"17","author":"M Cococcioni","year":"2020","unstructured":"Cococcioni M, Rossi F, Ruffaldi E, Saponara S (2020) Fast deep neural networks for image processing using posits and arm scalable vector extension. J Real-Time Image Process 17:759\u2013771","journal-title":"J Real-Time Image Process"},{"key":"4359_CR37","doi-asserted-by":"crossref","unstructured":"Chen C, Xiang X, Liu C, Shang Y, Guo R, Liu D, Lu Y, Hao Z, Luo J, Chen Z, et al (2020) Xuantie-910: a commercial multi-core 12-stage pipeline out-of-order 64-bit high performance RISC-V processor with vector extension: industrial product. In: 2020 ACM\/IEEE 47th annual international symposium on computer architecture (ISCA), pp 52\u201364. IEEE","DOI":"10.1109\/ISCA45697.2020.00016"}],"container-title":["The Journal of Supercomputing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11227-022-04359-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11227-022-04359-w\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11227-022-04359-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,9,19]],"date-time":"2024-09-19T21:31:32Z","timestamp":1726781492000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11227-022-04359-w"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,3,7]]},"references-count":37,"journal-issue":{"issue":"10","published-print":{"date-parts":[[2022,7]]}},"alternative-id":["4359"],"URL":"https:\/\/doi.org\/10.1007\/s11227-022-04359-w","relation":{},"ISSN":["0920-8542","1573-0484"],"issn-type":[{"value":"0920-8542","type":"print"},{"value":"1573-0484","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,3,7]]},"assertion":[{"value":"5 February 2022","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"7 March 2022","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}