{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,31]],"date-time":"2026-03-31T04:50:02Z","timestamp":1774932602104,"version":"3.50.1"},"reference-count":33,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2019,10,11]],"date-time":"2019-10-11T00:00:00Z","timestamp":1570752000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2019,12,31]]},"abstract":"<jats:p>The need for compilers to generate highly vectorized code is at an all-time high with the increasing vectorization capabilities of modern processors. To this end, the information that compilers have at their disposal, either through code analysis or via user annotations, is instrumental for auto-vectorization, and hence for the overall performance. However, the information that is available to compilers at compile time and its accuracy varies greatly, as does the resulting performance of vectorizing compilers. Benchmarks like the Test Suite for Vectorizing Compilers (TSVC) have been developed to evaluate the vectorization capability of such compilers. The overarching approach of TSVC and similar benchmarks is to evaluate the compilers under the best possible scenario (i.e., assuming that compilers have access to all useful contextual information at compile time). Although this idealistic view is useful to observe the capability of compilers for auto-vectorization, it is not a true reflection of the conditions found in real-world applications.<\/jats:p>\n          <jats:p>In this article, we propose a novel method for evaluating the auto-vectorization capability of compilers. Instead of assuming that compilers have access to a wealth of information at compile time, we formulate a method to objectively supply or withdraw information that would otherwise aid the compiler in the auto-vectorization process. This method is orthogonal to the approach adopted by TSVC, and as such, it provides the means of assessing the capabilities of modern vectorizing compilers in a more detailed way.<\/jats:p>\n          <jats:p>Using this new method, we exhaustively evaluated five industry-grade compilers (GNU, Intel, Clang, PGI, and IBM) on four representative vector platforms (AVX-2, AVX-512 (Skylake), AVX-512 (KNL), and AltiVec) using the modified version of TSVC and application-level proxy kernels. The results show the impact that withdrawing information has on the vectorization capabilities of each compiler and also prove the validity of the presented technique.<\/jats:p>","DOI":"10.1145\/3356842","type":"journal-article","created":{"date-parts":[[2019,10,11]],"date-time":"2019-10-11T14:53:33Z","timestamp":1570805613000},"page":"1-23","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":14,"title":["Evaluating Auto-Vectorizing Compilers through Objective Withdrawal of Useful Information"],"prefix":"10.1145","volume":"16","author":[{"given":"Sergi","family":"Siso","sequence":"first","affiliation":[{"name":"Hartree Centre and University of Liverpool, BrownlowHill, Liverpool, UK"}]},{"given":"Wes","family":"Armour","sequence":"additional","affiliation":[{"name":"University of Oxford, Oxford, UK"}]},{"given":"Jeyarajan","family":"Thiyagalingam","sequence":"additional","affiliation":[{"name":"Rutherford Appleton Laboratory, Oxford, UK"}]}],"member":"320","published-online":{"date-parts":[[2019,10,11]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Proceedings of the 2016 Euromicro Conference on Digital System Design (DSD\u201916)","author":"Alvanos M.","unstructured":"M. Alvanos and P. Trancoso . 2016. Video SIMDBench: Benchmarking the compiler vectorization for multimedia applications . In Proceedings of the 2016 Euromicro Conference on Digital System Design (DSD\u201916) . 168--175. M. Alvanos and P. Trancoso. 2016. Video SIMDBench: Benchmarking the compiler vectorization for multimedia applications. In Proceedings of the 2016 Euromicro Conference on Digital System Design (DSD\u201916). 168--175."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.micpro.2018.09.002"},{"key":"e_1_2_1_3_1","volume-title":"Proceedings of the 5th Annual Workshop on Modeling, Benchmarking, and Simulation.","author":"Bienia Christian","year":"2009","unstructured":"Christian Bienia and Kai Li . 2009 . PARSEC 2.0: A new benchmark suite for chip-multiprocessors . In Proceedings of the 5th Annual Workshop on Modeling, Benchmarking, and Simulation. Christian Bienia and Kai Li. 2009. PARSEC 2.0: A new benchmark suite for chip-multiprocessors. In Proceedings of the 5th Annual Workshop on Modeling, Benchmarking, and Simulation."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/2541228.2555294"},{"key":"e_1_2_1_5_1","volume-title":"Proceedings of the 1988 ACM\/IEEE Conference on Supercomputing (Supercomputing\u201988)","author":"Callahan D.","unstructured":"D. Callahan , J. Dongarra , and D. Levine . 1988. Vectorizing compilers: A test suite and results . In Proceedings of the 1988 ACM\/IEEE Conference on Supercomputing (Supercomputing\u201988) . IEEE, Los Alamitos, CA, 98--105. D. Callahan, J. Dongarra, and D. Levine. 1988. Vectorizing compilers: A test suite and results. In Proceedings of the 1988 ACM\/IEEE Conference on Supercomputing (Supercomputing\u201988). IEEE, Los Alamitos, CA, 98--105."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2014.6844462"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00607-015-0444-y"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/2381056.2381078"},{"key":"e_1_2_1_9_1","volume-title":"High Performance Computing","author":"Doerfert Johannes","unstructured":"Johannes Doerfert , Brian Homerding , and Hal Finkel . 2019. Performance exploration through optimistic static program annotations . In High Performance Computing . Springer , 247--268. Johannes Doerfert, Brian Homerding, and Hal Finkel. 2019. Performance exploration through optimistic static program annotations. In High Performance Computing. Springer, 247--268."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1002\/cpe.728"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/996841.996853"},{"key":"e_1_2_1_12_1","volume-title":"Languages and Compilers for Parallel Computing","author":"Fang Jesse Z.","unstructured":"Jesse Z. Fang . 1997. Compiler algorithms on if-conversion, speculative predicates assignment and predicated code optimizations . In Languages and Compilers for Parallel Computing , D. Sehr, U. Banerjee, D. Gelernter, A. Nicolau, and D. Padua (Eds.). Springer , Berlin, Germany , 135--153. Jesse Z. Fang. 1997. Compiler algorithms on if-conversion, speculative predicates assignment and predicated code optimizations. In Languages and Compilers for Parallel Computing, D. Sehr, U. Banerjee, D. Gelernter, A. Nicolau, and D. Padua (Eds.). Springer, Berlin, Germany, 135--153."},{"key":"e_1_2_1_13_1","volume-title":"Restrict-qualified pointers in LLVM. Retrieved","author":"Finkel Hal","year":"2019","unstructured":"Hal Finkel . 2017. Restrict-qualified pointers in LLVM. Retrieved September 4, 2019 from https:\/\/llvm.org\/devmtg\/2017-02-04\/Restrict-Qualified-Pointers-in-LLVM.pdf. Hal Finkel. 2017. Restrict-qualified pointers in LLVM. Retrieved September 4, 2019 from https:\/\/llvm.org\/devmtg\/2017-02-04\/Restrict-Qualified-Pointers-in-LLVM.pdf."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.micpro.2009.02.010"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/3276496"},{"key":"e_1_2_1_16_1","volume-title":"High Performance Parallelism Pearls","author":"Henderson Tom","unstructured":"Tom Henderson , Jhon Michalakes , Indraneil Gokhale , and Ashish Jha . 2015. Numerical weather prediction optimization . In High Performance Parallelism Pearls Volume Two: Multicore and Many-Core Programming Approaches. MKF Publishers, 7-- 23 . Tom Henderson, Jhon Michalakes, Indraneil Gokhale, and Ashish Jha. 2015. Numerical weather prediction optimization. In High Performance Parallelism Pearls Volume Two: Multicore and Many-Core Programming Approaches. MKF Publishers, 7--23."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-30961-8_5"},{"key":"e_1_2_1_18_1","volume-title":"LLVM Test-Suite: TSVC. Retrieved","author":"LLVM.","year":"2019","unstructured":"LLVM. 2018. LLVM Test-Suite: TSVC. Retrieved September 4, 2019 from http:\/\/llvm.org\/svn\/llvm-project\/test-suite\/trunk\/MultiSource\/Benchmarks\/TSVC\/. LLVM. 2018. LLVM Test-Suite: TSVC. Retrieved September 4, 2019 from http:\/\/llvm.org\/svn\/llvm-project\/test-suite\/trunk\/MultiSource\/Benchmarks\/TSVC\/."},{"key":"e_1_2_1_19_1","volume-title":"Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques (PACT\u201911)","author":"Maleki Saeed","unstructured":"Saeed Maleki , Yaoqing Gao , Maria J. Garzar\u00e1n , Tommy Wong , and David A. Padua . 2011. An evaluation of vectorizing compilers . In Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques (PACT\u201911) . IEEE, Los Alamitos, CA, 372--382. Saeed Maleki, Yaoqing Gao, Maria J. Garzar\u00e1n, Tommy Wong, and David A. Padua. 2011. An evaluation of vectorizing compilers. In Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques (PACT\u201911). IEEE, Los Alamitos, CA, 372--382."},{"key":"e_1_2_1_20_1","volume-title":"Kurnosov","author":"Moldovanova Olga V.","year":"2017","unstructured":"Olga V. Moldovanova and Mikhail G . Kurnosov . 2017 . Auto-vectorization of loops on Intel 64 and Intel Xeon Phi: Analysis and evaluation. In Parallel Computing Technologies, V. Malyshkin (Ed.). Springer , Cham, Switzerland, 143--150. Olga V. Moldovanova and Mikhail G. Kurnosov. 2017. Auto-vectorization of loops on Intel 64 and Intel Xeon Phi: Analysis and evaluation. In Parallel Computing Technologies, V. Malyshkin (Ed.). Springer, Cham, Switzerland, 143--150."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/3192366.3192413"},{"key":"e_1_2_1_22_1","volume-title":"Proceedings of the 2012 Innovative Parallel Computing Conference (InPar\u201912)","author":"Pharr M.","unstructured":"M. Pharr and W. R. Mark . 2012. ispc: A SPMD compiler for high-performance CPU programming . In Proceedings of the 2012 Innovative Parallel Computing Conference (InPar\u201912) . 1--13. M. Pharr and W. R. Mark. 2012. ispc: A SPMD compiler for high-performance CPU programming. In Proceedings of the 2012 Innovative Parallel Computing Conference (InPar\u201912). 1--13."},{"key":"e_1_2_1_23_1","volume-title":"Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium. 89b.","author":"Ren G.","unstructured":"G. Ren , P. Wu , and D. Padua . 2005. An empirical study on the vectorization of multimedia applications for multimedia extensions . In Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium. 89b. G. Ren, P. Wu, and D. Padua. 2005. An empirical study on the vectorization of multimedia applications for multimedia extensions. In Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium. 89b."},{"key":"e_1_2_1_24_1","volume-title":"FY18 Proxy App Suite Release. Milestone Report for the ECP Proxy App Project. Retrieved","author":"Richards David F.","year":"2019","unstructured":"David F. Richards , Omar Aaziz , Jeanine Cook , Hal Finkel , Brian Homerding , Peter McCorquodale , Tiffany Mintz , Shirley Moore , Abhinacv Bhatele , and Robert Pavel . 2018. FY18 Proxy App Suite Release. Milestone Report for the ECP Proxy App Project. Retrieved September 4, 2019 from https:\/\/osti.gov. David F. Richards, Omar Aaziz, Jeanine Cook, Hal Finkel, Brian Homerding, Peter McCorquodale, Tiffany Mintz, Shirley Moore, Abhinacv Bhatele, and Robert Pavel. 2018. FY18 Proxy App Suite Release. Milestone Report for the ECP Proxy App Project. Retrieved September 4, 2019 from https:\/\/osti.gov."},{"key":"e_1_2_1_25_1","volume-title":"Proceedings of the 17th International Conference on Languages and Compilers for High Performance Computing (LCPC\u201904)","author":"Rickett Christopher D.","unstructured":"Christopher D. Rickett , Sung-Eun Choi , and Bradford L. Chamberlain . 2005. Compiling high-level languages for vector architectures . In Proceedings of the 17th International Conference on Languages and Compilers for High Performance Computing (LCPC\u201904) . 224--237. Christopher D. Rickett, Sung-Eun Choi, and Bradford L. Chamberlain. 2005. Compiling high-level languages for vector architectures. In Proceedings of the 17th International Conference on Languages and Compilers for High Performance Computing (LCPC\u201904). 224--237."},{"key":"e_1_2_1_26_1","volume-title":"Reducing the functionality gap between auto-vectorization and explicit vectorization","author":"Saito Hideki","unstructured":"Hideki Saito , Serge Preis , Nikolay Panchenko , and Xinmin Tian . 2016. Reducing the functionality gap between auto-vectorization and explicit vectorization . In OpenMP: Memory, Devices, and Tasks, N. Maruyama, B. R. de Supinski, and M. Wahib (Eds.). Springer , Cham, Switzerland, 173--186. Hideki Saito, Serge Preis, Nikolay Panchenko, and Xinmin Tian. 2016. Reducing the functionality gap between auto-vectorization and explicit vectorization. In OpenMP: Memory, Devices, and Tasks, N. Maruyama, B. R. de Supinski, and M. Wahib (Eds.). Springer, Cham, Switzerland, 173--186."},{"key":"e_1_2_1_27_1","volume-title":"Proceedings of the 2012 39th Annual International Symposium on Computer Architecture (ISCA\u201912)","author":"Satish N.","unstructured":"N. Satish , C. Kim , J. Chhugani , H. Saito , R. Krishnaiyer , M. Smelyanskiy , M. Girkar , and P. Dubey . 2012. Can traditional programming bridge the Ninja performance gap for parallel computing applications? In Proceedings of the 2012 39th Annual International Symposium on Computer Architecture (ISCA\u201912) . 440--451. N. Satish, C. Kim, J. Chhugani, H. Saito, R. Krishnaiyer, M. Smelyanskiy, M. Girkar, and P. Dubey. 2012. Can traditional programming bridge the Ninja performance gap for parallel computing applications? In Proceedings of the 2012 39th Annual International Symposium on Computer Architecture (ISCA\u201912). 440--451."},{"key":"e_1_2_1_28_1","volume-title":"Proceedings of the EMerging Technology Conference.15--18","author":"Siso Sergi","unstructured":"Sergi Siso , Luke Mason , and Michael Seaton . [n.d.]. Code modernization of DLMESO LBE to achieve good performance on the Intel Xeon Phi . In Proceedings of the EMerging Technology Conference.15--18 . Sergi Siso, Luke Mason, and Michael Seaton. [n.d.]. Code modernization of DLMESO LBE to achieve good performance on the Intel Xeon Phi. In Proceedings of the EMerging Technology Conference.15--18."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/MC.2003.1220579"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2017.35"},{"key":"e_1_2_1_31_1","volume-title":"Effective SIMD vectorization for Intel Xeon Phi coprocessors. Scientific Programming 2015 (Jan","author":"Tian Xinmin","year":"2016","unstructured":"Xinmin Tian , Hideki Saito , Serguei V. Preis , Eric N. Garcia , Sergey S. Kozhukhov , Matt Masten , Aleksei G. Cherkasov , and Nikolay Panchenko . 2016. Effective SIMD vectorization for Intel Xeon Phi coprocessors. Scientific Programming 2015 (Jan . 2016 ), Article 1, 1 page. Xinmin Tian, Hideki Saito, Serguei V. Preis, Eric N. Garcia, Sergey S. Kozhukhov, Matt Masten, Aleksei G. Cherkasov, and Nikolay Panchenko. 2016. Effective SIMD vectorization for Intel Xeon Phi coprocessors. Scientific Programming 2015 (Jan. 2016), Article 1, 1 page."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.micpro.2016.11.014"},{"key":"e_1_2_1_34_1","volume-title":"Big Data Computing and Communications","author":"Zhao Bo","unstructured":"Bo Zhao , Wei Gao , Rongcai Zhao , Lin Han , Huihui Sun , and Yingying Li. 2015. Performance evaluation of NPB and SPEC CPU2006 on various SIMD extensions . In Big Data Computing and Communications , Y. Wang et al. (Eds.). Springer , 257--272. Bo Zhao, Wei Gao, Rongcai Zhao, Lin Han, Huihui Sun, and Yingying Li. 2015. Performance evaluation of NPB and SPEC CPU2006 on various SIMD extensions. In Big Data Computing and Communications, Y. Wang et al. (Eds.). Springer, 257--272."}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3356842","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3356842","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T23:22:55Z","timestamp":1750202575000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3356842"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,10,11]]},"references-count":33,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2019,12,31]]}},"alternative-id":["10.1145\/3356842"],"URL":"https:\/\/doi.org\/10.1145\/3356842","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"value":"1544-3566","type":"print"},{"value":"1544-3973","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,10,11]]},"assertion":[{"value":"2019-01-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-08-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-10-11","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}