{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,11]],"date-time":"2026-03-11T16:35:10Z","timestamp":1773246910722,"version":"3.50.1"},"reference-count":37,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2023,12,15]],"date-time":"2023-12-15T00:00:00Z","timestamp":1702598400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Inria, CNRS (LABRI and IMB), Universit\u00e9 de Bordeaux, Bordeaux INP"},{"name":"Conseil R\u00e9gional d\u2019Aquitaine"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2024,3,31]]},"abstract":"<jats:p>Leveraging the SIMD capability of modern CPU architectures is mandatory to take full advantage of their increased performance. To exploit this capability, binary executables must be vectorized, either manually by developers or automatically by a tool. For this reason, the compilation research community has developed several strategies for transforming scalar code into a vectorized implementation. However, most existing automatic vectorization techniques in modern compilers are designed for regular codes, leaving irregular applications with non-contiguous data access patterns at a disadvantage. In this article, we present a new tool, Autovesk, that automatically generates vectorized code from scalar code, specifically targeting irregular data access patterns. We describe how our method transforms a graph of scalar instructions into a vectorized one, using different heuristics to reduce the number or cost of instructions. Finally, we demonstrate the effectiveness of our approach on various computational kernels using Intel AVX-512 and ARM SVE. We compare the speedups of Autovesk vectorized code over GCC, Clang LLVM, and Intel automatic vectorization optimizations. We achieve competitive results on linear kernels and up to 11\u00d7 speedups on irregular kernels.<\/jats:p>","DOI":"10.1145\/3631709","type":"journal-article","created":{"date-parts":[[2023,11,9]],"date-time":"2023-11-09T11:42:17Z","timestamp":1699530137000},"page":"1-25","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["Autovesk: Automatic Vectorized Code Generation from Unstructured Static Kernels Using Graph Transformations"],"prefix":"10.1145","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1634-6124","authenticated-orcid":false,"given":"Hayfa","family":"Tayeb","sequence":"first","affiliation":[{"name":"ICube Lab, France and Inria, France and University of Strasbourg, France"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5428-8834","authenticated-orcid":false,"given":"Ludovic","family":"Paillat","sequence":"additional","affiliation":[{"name":"ICube Lab, France and Inria, France and University of Strasbourg, France"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0281-9709","authenticated-orcid":false,"given":"B\u00e9renger","family":"Bramas","sequence":"additional","affiliation":[{"name":"ICube Lab, France and Inria, France and University of Strasbourg, France"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2023,12,15]]},"reference":[{"key":"e_1_3_5_2_2","doi-asserted-by":"publisher","DOI":"10.1145\/29873.29875"},{"key":"e_1_3_5_3_2","doi-asserted-by":"publisher","DOI":"10.1145\/2838735"},{"key":"e_1_3_5_4_2","doi-asserted-by":"crossref","first-page":"697","DOI":"10.1145\/2908080.2908111","volume-title":"Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI\u201916)","author":"Baghsorkhi Sara S.","year":"2016","unstructured":"Sara S. Baghsorkhi, Nalini Vasudevan, and Youfeng Wu. 2016. FlexVec: Auto-vectorization for irregular loops. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI\u201916). Association for Computing Machinery, New York, NY, 697\u2013710. DOI:10.1145\/2908080.2908111"},{"key":"e_1_3_5_5_2","doi-asserted-by":"crossref","first-page":"209","DOI":"10.1007\/978-3-540-24644-2_14","volume-title":"Languages and Compilers for Parallel Computing","author":"Bastoul C\u00e9dric","year":"2004","unstructured":"C\u00e9dric Bastoul, Albert Cohen, Sylvain Girbal, Saurabh Sharma, and Olivier Temam. 2004. Putting polyhedral loop transformations to work. In Languages and Compilers for Parallel Computing, Lawrence Rauchwerger (Ed.). Springer Berlin, 209\u2013225."},{"key":"e_1_3_5_6_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1155\/2017\/5482468","article-title":"Inastemp: A novel intrinsics-as-template library for portable SIMD-vectorization","volume":"2017","author":"Bramas Berenger","year":"2017","unstructured":"Berenger Bramas. 2017. Inastemp: A novel intrinsics-as-template library for portable SIMD-vectorization. Scient. Programm. 2017 (2017), 1\u201318.","journal-title":"Scient. Programm."},{"key":"e_1_3_5_7_2","doi-asserted-by":"publisher","DOI":"10.7717\/peerj-cs.151"},{"key":"e_1_3_5_8_2","first-page":"902","volume-title":"Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS\u201921)","author":"Chen Yishen","year":"2021","unstructured":"Yishen Chen, Charith Mendis, Michael Carbin, and Saman Amarasinghe. 2021. VeGen: A vectorizer generator for SIMD and beyond. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS\u201921). Association for Computing Machinery, New York, NY, 902\u2013914. DOI:10.1145\/3445814.3446692"},{"key":"e_1_3_5_9_2","volume-title":"Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC\u201922)","author":"Cheshmi Kazem","year":"2022","unstructured":"Kazem Cheshmi, Zachary Cetinic, and Maryam Mehri Dehnavi. 2022. Vectorizing sparse matrix computations with partially-strided codelets. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC\u201922). IEEE Press, Article 32, 15 pages."},{"key":"e_1_3_5_10_2","first-page":"19","volume-title":"Proceedings of the 17th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT\u201916)","author":"Dong Wang","year":"2016","unstructured":"Wang Dong, Zhao Rongcai, Wang Qi, and Li Yingying. 2016. Outer-loop auto-vectorization for SIMD architectures based on Open64 compiler. In Proceedings of the 17th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT\u201916). IEEE Computer Society, Washington, DC, 19\u201323."},{"key":"e_1_3_5_11_2","doi-asserted-by":"crossref","first-page":"82","DOI":"10.1145\/996841.996853","volume-title":"Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI\u201904)","author":"Eichenberger Alexandre E.","year":"2004","unstructured":"Alexandre E. Eichenberger, Peng Wu, and Kevin O\u2019Brien. 2004. Vectorization for SIMD architectures with alignment constraints. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI\u201904). Association for Computing Machinery, New York, NY, 82\u201393. DOI:10.1145\/996841.996853"},{"key":"e_1_3_5_12_2","volume-title":"Advanced Concepts for Intelligent Vision Systems","author":"Falcou Jo\u00ebl","year":"2004","unstructured":"Jo\u00ebl Falcou and Jocelyn Serot. 2004. Application of template-based metaprogramming compilation techniques to the efficient implementation of image processing algorithms on SIMD-capable processors. In Advanced Concepts for Intelligent Vision Systems, VUB Press, Brussels."},{"issue":"9","key":"e_1_3_5_13_2","doi-asserted-by":"crossref","first-page":"948","DOI":"10.1109\/TC.1972.5009071","article-title":"Some computer organizations and their effectiveness","volume":"21","author":"Flynn Michael J.","year":"1972","unstructured":"Michael J. Flynn. 1972. Some computer organizations and their effectiveness. IEEE Trans. Comput. C-21, 9 (1972), 948\u2013960.","journal-title":"IEEE Trans. Comput."},{"key":"e_1_3_5_14_2","first-page":"848","volume-title":"Proceedings of the International Conference on High Performance Computing & Simulation (HPCS\u201916)","author":"Gross Matthias","year":"2016","unstructured":"Matthias Gross. 2016. Neat SIMD: Elegant vectorization in C++ by using specialized templates. In Proceedings of the International Conference on High Performance Computing & Simulation (HPCS\u201916). IEEE, 848\u2013857. DOI:10.1109\/HPCSim.2016.7568423"},{"key":"e_1_3_5_15_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10766-016-0480-z"},{"key":"e_1_3_5_16_2","first-page":"175","volume-title":"Proceedings of the International Symposium on Code Generation and Optimization (CGO\u201918)","author":"Jiang Peng","year":"2018","unstructured":"Peng Jiang and Gagan Agrawal. 2018. Conflict-free vectorization of associative irregular applications with recent SIMD architectural advances. In Proceedings of the International Symposium on Code Generation and Optimization (CGO\u201918). Association for Computing Machinery, New York, NY, 175\u2013187. DOI:10.1145\/3168827"},{"key":"e_1_3_5_17_2","volume-title":"Proceedings of the International Conference on Supercomputing (ICS\u201916)","author":"Jiang Peng","year":"2016","unstructured":"Peng Jiang, Linchuan Chen, and Gagan Agrawal. 2016. Reusing data reorganization for efficient SIMD parallelization of adaptive irregular applications. In Proceedings of the International Conference on Supercomputing (ICS\u201916). Association for Computing Machinery, New York, NY, Article 16, 10 pages. DOI:10.1145\/2925426.2926285"},{"key":"e_1_3_5_18_2","volume-title":"Optimizing Compilers for Modern Architectures: A Dependence-based Approach","author":"Kennedy Ken","year":"2001","unstructured":"Ken Kennedy and John R. Allen. 2001. Optimizing Compilers for Modern Architectures: A Dependence-based Approach. Morgan Kaufmann Publishers Inc., San Francisco, CA."},{"key":"e_1_3_5_19_2","doi-asserted-by":"crossref","first-page":"145","DOI":"10.1145\/349299.349320","volume-title":"Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI\u201900)","author":"Larsen Samuel","year":"2000","unstructured":"Samuel Larsen and Saman Amarasinghe. 2000. Exploiting superword level parallelism with multimedia instruction sets. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI\u201900). Association for Computing Machinery, New York, NY, 145\u2013156. DOI:10.1145\/349299.349320"},{"key":"e_1_3_5_20_2","first-page":"17","volume-title":"Proceedings of the Workshop on Programming Models for SIMD\/Vector Processing (WPMVP\u201914)","author":"Lei\u00dfa Roland","year":"2014","unstructured":"Roland Lei\u00dfa, Immanuel Haffner, and Sebastian Hack. 2014. Sierra: A SIMD extension for C++. In Proceedings of the Workshop on Programming Models for SIMD\/Vector Processing (WPMVP\u201914). Association for Computing Machinery, New York, NY, 17\u201324. DOI:10.1145\/2568058.2568062"},{"key":"e_1_3_5_21_2","volume-title":"Proceedings of the 5th Workshop on Programming Models for SIMD\/Vector Processing (WPMVP\u201919)","author":"Moll Simon","year":"2019","unstructured":"Simon Moll, Shrey Sharma, Matthias Kurtenacker, and Sebastian Hack. 2019. Multi-dimensional vectorization in LLVM. In Proceedings of the 5th Workshop on Programming Models for SIMD\/Vector Processing (WPMVP\u201919). Association for Computing Machinery, New York, NY, Article 3, 8 pages. DOI:10.1145\/3303117.3306172"},{"key":"e_1_3_5_22_2","unstructured":"Ralf M\u00f6ller. 2016. Design of a low-level C++ template SIMD library. Bielefeld University Faculty of Technology Computer Engineering Group."},{"key":"e_1_3_5_23_2","doi-asserted-by":"publisher","DOI":"10.1145\/1133255.1133997"},{"key":"e_1_3_5_24_2","doi-asserted-by":"crossref","first-page":"2","DOI":"10.1145\/1454115.1454119","volume-title":"Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT\u201908)","author":"Nuzman Dorit","year":"2008","unstructured":"Dorit Nuzman and Ayal Zaks. 2008. Outer-loop vectorization: Revisited for short SIMD architectures. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT\u201908). Association for Computing Machinery, New York, NY, 2\u201311. DOI:10.1145\/1454115.1454119"},{"key":"e_1_3_5_25_2","unstructured":"Sebastian Pop Albert Cohen C\u00e9dric Bastoul Sylvain Girbal Georges-Andr\u00e9 Silber and Nicolas Vasilache. 2006. GRAPHITE: Polyhedral analyses and optimizations for GCC. In Proceedings of the GCC Developers. Summit 2006 Ottawa ON 260."},{"key":"e_1_3_5_26_2","first-page":"206","volume-title":"Proceedings of the IEEE\/ACM International Symposium on Code Generation and Optimization (CGO\u201919)","author":"Porpodas Vasileios","year":"2019","unstructured":"Vasileios Porpodas, Rodrigo C. O. Rocha, Evgueni Brevnov, Lu\u00eds F. W. G\u00f3es, and Timothy Mattson. 2019. Super-node SLP: Optimized vectorization for code sequences containing operators and their inverse elements. In Proceedings of the IEEE\/ACM International Symposium on Code Generation and Optimization (CGO\u201919), IEEE, Washington DC, 206\u2013216. DOI:10.1109\/CGO.2019.8661192"},{"key":"e_1_3_5_27_2","first-page":"163","volume-title":"Proceedings of the International Symposium on Code Generation and Optimization (CGO\u201918)","author":"Porpodas Vasileios","year":"2018","unstructured":"Vasileios Porpodas, Rodrigo C. O. Rocha, and Lu\u00eds F. W. G\u00f3es. 2018. Look-ahead SLP: Auto-vectorization in the presence of commutative operations. In Proceedings of the International Symposium on Code Generation and Optimization (CGO\u201918). Association for Computing Machinery, New York, NY, 163\u2013174. DOI:10.1145\/3168807"},{"key":"e_1_3_5_28_2","volume-title":"Numerical Recipes 3rd Edition: The Art of Scientific Computing (3rd ed.)","author":"Press William H.","year":"2007","unstructured":"William H. Press, Saul A. Teukolsky, William T. Vetterling, and Brian P. Flannery. 2007. Numerical Recipes 3rd Edition: The Art of Scientific Computing (3rd ed.). Cambridge University Press."},{"key":"e_1_3_5_29_2","first-page":"1","volume-title":"Proceedings of the 29th International Conference on Compiler Construction (CC\u201920)","author":"Rocha Rodrigo C. O.","year":"2020","unstructured":"Rodrigo C. O. Rocha, Vasileios Porpodas, Pavlos Petoumenos, Lu\u00eds F. W. G\u00f3es, Zheng Wang, Murray Cole, and Hugh Leather. 2020. Vectorization-aware loop unrolling with seed forwarding. In Proceedings of the 29th International Conference on Compiler Construction (CC\u201920). Association for Computing Machinery, New York, NY, 1\u201313. DOI:10.1145\/3377555.3377890"},{"key":"e_1_3_5_30_2","first-page":"131","article-title":"Loop-aware SLP in GCC","author":"Rosen Ira","year":"2007","unstructured":"Ira Rosen, D. Nuzman, and A. Zaks. 2007. Loop-aware SLP in GCC. GCC Devel. Summ. (01 2007), 131\u2013142.","journal-title":"GCC Devel. Summ."},{"key":"e_1_3_5_31_2","doi-asserted-by":"publisher","DOI":"10.1147\/rd.302.0163"},{"key":"e_1_3_5_32_2","doi-asserted-by":"crossref","unstructured":"P. Souza L. Borges C. Andreolli and P. Thierry. 2015. OpenVec portable SIMD intrinsics. Second EAGE Workshop on High Performance Computing for Upstream 2015 1 (2015) 1\u20135. DOI:https:\/\/doi.org\/10.3997\/2214-4609.201414038","DOI":"10.3997\/2214-4609.201414038"},{"key":"e_1_3_5_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2018.2857721"},{"key":"e_1_3_5_34_2","first-page":"353","volume-title":"Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques","author":"Sujon Majedul Haque","year":"2013","unstructured":"Majedul Haque Sujon, R. Clint Whaley, and Qing Yi. 2013. Vectorization past dependent branches through speculation. In Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques. Association for Computing Machinery, New York, NY, 353\u2013362. DOI:10.1109\/PACT.2013.6618831"},{"key":"e_1_3_5_35_2","doi-asserted-by":"publisher","DOI":"10.1145\/2838734"},{"key":"e_1_3_5_36_2","first-page":"327","volume-title":"Proceedings of the 18th International Conference on Parallel Architectures and Compilation Techniques","author":"Trifunovic Konrad","year":"2009","unstructured":"Konrad Trifunovic, Dorit Nuzman, Albert Cohen, Ayal Zaks, and Ira Rosen. 2009. Polyhedral-model guided loop-nest auto-vectorization. In Proceedings of the 18th International Conference on Parallel Architectures and Compilation Techniques. IEEE Computer Society, 327\u2013337. DOI:10.1109\/PACT.2009.18"},{"key":"e_1_3_5_37_2","first-page":"9","volume-title":"Proceedings of the Workshop on Programming Models for SIMD\/Vector Processing (WPMVP\u201914)","author":"Wang Haichuan","year":"2014","unstructured":"Haichuan Wang, Peng Wu, Ilie Gabriel Tanase, Mauricio J. Serrano, and Jos\u00e9 E. Moreira. 2014. Simple, portable and fast SIMD intrinsic programming: Generic SIMD library. In Proceedings of the Workshop on Programming Models for SIMD\/Vector Processing (WPMVP\u201914). Association for Computing Machinery, New York, NY, 9\u201316. DOI:10.1145\/2568058.2568059"},{"key":"e_1_3_5_38_2","doi-asserted-by":"publisher","DOI":"10.1145\/3566054"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3631709","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3631709","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:35:43Z","timestamp":1750178143000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3631709"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,12,15]]},"references-count":37,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2024,3,31]]}},"alternative-id":["10.1145\/3631709"],"URL":"https:\/\/doi.org\/10.1145\/3631709","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"value":"1544-3566","type":"print"},{"value":"1544-3973","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,12,15]]},"assertion":[{"value":"2022-12-19","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-10-27","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-12-15","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}