{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,14]],"date-time":"2026-02-14T10:24:40Z","timestamp":1771064680637,"version":"3.50.1"},"reference-count":33,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2012,1,1]],"date-time":"2012-01-01T00:00:00Z","timestamp":1325376000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["8.12E+12"],"award-info":[{"award-number":["8.12E+12"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000143","name":"Division of Computing and Communication Foundations","doi-asserted-by":"publisher","award":["CCF-0926127"],"award-info":[{"award-number":["CCF-0926127"]}],"id":[{"id":"10.13039\/100000143","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2012,1]]},"abstract":"<jats:p>Automatic vectorization is critical to enhancing performance of compute-intensive programs on modern processors. However, there is much room for improvement over the auto-vectorization capabilities of current production compilers through careful vector-code synthesis that utilizes a variety of loop transformations (e.g., unroll-and-jam, interchange, etc.).<\/jats:p><jats:p>As the set of transformations considered is increased, the selection of the most effective combination of transformations becomes a significant challenge: Currently used cost models in vectorizing compilers are often unable to identify the best choices. In this paper, we address this problem using machine learning models to predict the performance of SIMD codes. In contrast to existing approaches that have used high-level features of the program, we develop machine learning models based on features extracted from the generated assembly code. The models are trained offline on a number of benchmarks and used at compile-time to discriminate between numerous possible vectorized variants generated from the input code.<\/jats:p><jats:p>We demonstrate the effectiveness of the machine learning model by using it to guide automatic vectorization on a variety of tensor contraction kernels, with improvements ranging from 2\u00d7 to 8\u00d7 over Intel ICC's auto-vectorized code. We also evaluate the effectiveness of the model on a number of stencil computations and show good improvement over auto-vectorized code.<\/jats:p>","DOI":"10.1145\/2086696.2086729","type":"journal-article","created":{"date-parts":[[2012,1,24]],"date-time":"2012-01-24T16:47:14Z","timestamp":1327423634000},"page":"1-23","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":37,"title":["Using machine learning to improve automatic vectorization"],"prefix":"10.1145","volume":"8","author":[{"given":"Kevin","family":"Stock","sequence":"first","affiliation":[{"name":"The Ohio State University"}]},{"given":"Louis-No\u00ebl","family":"Pouchet","sequence":"additional","affiliation":[{"name":"The Ohio State University"}]},{"given":"P.","family":"Sadayappan","sequence":"additional","affiliation":[{"name":"The Ohio State University"}]}],"member":"320","published-online":{"date-parts":[[2012,1,26]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2006.37"},{"key":"e_1_2_1_2_1","volume-title":"Proceedings of Supercomputing.","author":"Baumgartner G.","unstructured":"Baumgartner , G. , Bernholdt , D. , Cociorva , D. , Harrison , R. , Hirata , S. , Lam , C.-C. , Nooijen , M. , Pitzer , R. , Ramanujam , J. , and Sadayappan , P . 2002. A high-level approach to synthesis of high-performance codes for quantum chemistry . In Proceedings of Supercomputing. Baumgartner, G., Bernholdt, D., Cociorva, D., Harrison, R., Hirata, S., Lam, C.-C., Nooijen, M., Pitzer, R., Ramanujam, J., and Sadayappan, P. 2002. A high-level approach to synthesis of high-performance codes for quantum chemistry. In Proceedings of Supercomputing."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.5555\/1756006.1953016"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2007.32"},{"key":"e_1_2_1_5_1","unstructured":"Chen C. Chame J. and Hall M. 2008. CHiLL: A framework for composing high-level loop transformations. Tech. rep. 08-897 University of Southern California. Chen C. Chame J. and Hall M. 2008. CHiLL: A framework for composing high-level loop transformations. Tech. rep. 08-897 University of Southern California."},{"key":"e_1_2_1_6_1","volume-title":"Proceedings of IPDPS.","author":"Chen C.","unstructured":"Chen , C. , Shin , J. , Kintali , S. , Chame , J. , and Hall , M . 2007. Model-guided empirical optimization for multimedia extension architectures: A case study . In Proceedings of IPDPS. Chen, C., Shin, J., Kintali, S., Chame, J., and Hall, M. 2007. Model-guided empirical optimization for multimedia extension architectures: A case study. In Proceedings of IPDPS."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/314403.314414"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1015729001611"},{"key":"e_1_2_1_9_1","doi-asserted-by":"crossref","first-page":"33","DOI":"10.1002\/9780470125915.ch2","article-title":"An introduction to coupled cluster theory for computational chemists","volume":"14","author":"Crawford T.","year":"2000","unstructured":"Crawford , T. and Schaefer III, H . 2000 . An introduction to coupled cluster theory for computational chemists . In Reviews in Computational Chemistry , Vol. 14 , 33 -- 136 . Crawford, T. and Schaefer III, H. 2000. An introduction to coupled cluster theory for computational chemists. In Reviews in Computational Chemistry, Vol. 14, 33--136.","journal-title":"Reviews in Computational Chemistry"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/377792.377807"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/1669112.1669124"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/996841.996853"},{"key":"e_1_2_1_13_1","volume-title":"Proceedings of CC.","author":"Fireman L.","unstructured":"Fireman , L. , Petrank , E. , and Zaks , A . 2007. New algorithms for simd alignment . In Proceedings of CC. Fireman, L., Petrank, E., and Zaks, A. 2007. New algorithms for simd alignment. In Proceedings of CC."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/1065910.1065922"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/PACT.2005.9"},{"key":"e_1_2_1_16_1","first-page":"1","article-title":"Multiresolution computational chemistry","volume":"16","author":"Harrison R. J.","year":"2005","unstructured":"Harrison , R. J. , Fann , G. I. , Gan , Z. , Yanai , T. , Sugiki , S. , Beste , A. , and Beylkin , G. 2005 . Multiresolution computational chemistry . J. Physics (Conference Series) 16 , 1 , 243. Harrison, R. J., Fann, G. I., Gan, Z., Yanai, T., Sugiki, S., Beste, A., and Beylkin, G. 2005. Multiresolution computational chemistry. J. Physics (Conference Series) 16, 1, 243.","journal-title":"J. Physics (Conference Series)"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1021\/jp034596z"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/1509864.1509866"},{"key":"e_1_2_1_19_1","unstructured":"Kennedy K. and Allen J. 2002. Optimizing Compilers for Modern Architectures: A Dependence-Based Approach. Morgan Kaufmann. Kennedy K. and Allen J. 2002. Optimizing Compilers for Modern Architectures: A Dependence-Based Approach. Morgan Kaufmann."},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.5555\/517554.825767"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/349299.349320"},{"key":"e_1_2_1_22_1","volume-title":"Proceedings of PACT.","author":"Larsen S.","unstructured":"Larsen , S. , Witchel , E. , and Amarasinghe , S. P . 2002. Increasing and detecting memory address congruence . In Proceedings of PACT. Larsen, S., Witchel, E., and Amarasinghe, S. P. 2002. Increasing and detecting memory address congruence. In Proceedings of PACT."},{"key":"e_1_2_1_23_1","volume-title":"Proceedings of AIMSA. 41--50","author":"Monsifrot A.","unstructured":"Monsifrot , A. , Bodin , F. , and Quiniou , R . 2002. A machine learning approach to automatic production of compiler heuristics . In Proceedings of AIMSA. 41--50 . Monsifrot, A., Bodin, F., and Quiniou, R. 2002. A machine learning approach to automatic production of compiler heuristics. In Proceedings of AIMSA. 41--50."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2006.25"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/1133981.1133997"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/1454115.1454119"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2011.101"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2009.5161054"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/PACT.2009.18"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/76263.76337"},{"key":"e_1_2_1_31_1","volume-title":"High Performance Compilers For Parallel Computing","author":"Wolfe M. J.","unstructured":"Wolfe , M. J. 1996. High Performance Compilers For Parallel Computing . Addison-Wesley . Wolfe, M. J. 1996. High Performance Compilers For Parallel Computing. Addison-Wesley."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-89740-8_24"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/1772954.1772982"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2086696.2086729","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2086696.2086729","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T10:06:43Z","timestamp":1750241203000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2086696.2086729"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,1]]},"references-count":33,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2012,1]]}},"alternative-id":["10.1145\/2086696.2086729"],"URL":"https:\/\/doi.org\/10.1145\/2086696.2086729","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"value":"1544-3566","type":"print"},{"value":"1544-3973","type":"electronic"}],"subject":[],"published":{"date-parts":[[2012,1]]},"assertion":[{"value":"2011-07-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2011-12-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2012-01-26","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}