{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,30]],"date-time":"2026-04-30T11:12:42Z","timestamp":1777547562481,"version":"3.51.4"},"reference-count":44,"publisher":"Association for Computing Machinery (ACM)","issue":"OOPSLA","license":[{"start":{"date-parts":[[2018,10,24]],"date-time":"2018-10-24T00:00:00Z","timestamp":1540339200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["1533912"],"award-info":[{"award-number":["1533912"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Program. Lang."],"published-print":{"date-parts":[[2018,10,24]]},"abstract":"<jats:p>\n            Modern compiler optimization is a complex process that offers no guarantees to deliver the fastest, most efficient target code. For this reason, compilers struggle to produce a stable performance from versions of code that carry out the same computation and only differ in the order of operations. This instability makes compilers much less effective program optimization tools and often forces programmers to carry out a brute force search when tuning for performance. In this paper, we analyze the stability of the compilation process and the performance headroom of three widely used general purpose compilers: GCC, ICC, and Clang. For the study, we extracted over 1,000 &lt;pre&gt;for&lt;\/pre&gt; loop nests from well-known benchmarks, libraries, and real applications; then, we applied sequences of source-level loop transformations to these loop nests to create numerous semantically equivalent\n            <jats:italic>mutations<\/jats:italic>\n            ; finally, we analyzed the impact of transformations on code quality in terms of locality, dynamic instruction count, and vectorization. Our results show that, by applying source-to-source transformations and searching for the best vectorization setting, the percentage of loops sped up by at least 1.15x is 46.7% for GCC, 35.7% for ICC, and 46.5% for Clang, and on average the potential for performance improvement is estimated to be at least 23.7% for GCC, 18.1% for ICC, and 26.4% for Clang. Our stability analysis shows that, under our experimental setup, the average coefficient of variation of the execution time across all mutations is 18.2% for GCC, 19.5% for ICC, and 16.9% for Clang, and the highest coefficient of variation for a single loop nest reaches 118.9% for GCC, 124.3% for ICC, and 110.5% for Clang. We conclude that the evaluated compilers need further improvements to claim they have stable behavior.\n          <\/jats:p>","DOI":"10.1145\/3276496","type":"journal-article","created":{"date-parts":[[2018,10,24]],"date-time":"2018-10-24T11:57:18Z","timestamp":1540382238000},"page":"1-29","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":26,"title":["An empirical study of the effect of source-level loop transformations on compiler stability"],"prefix":"10.1145","volume":"2","author":[{"given":"Zhangxiaowen","family":"Gong","sequence":"first","affiliation":[{"name":"University of Illinois at Urbana-Champaign, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zhi","family":"Chen","sequence":"additional","affiliation":[{"name":"University of California at Irvine, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Justin","family":"Szaday","sequence":"additional","affiliation":[{"name":"University of Illinois at Urbana-Champaign, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"David","family":"Wong","sequence":"additional","affiliation":[{"name":"Intel, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zehra","family":"Sura","sequence":"additional","affiliation":[{"name":"IBM, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Neftali","family":"Watkinson","sequence":"additional","affiliation":[{"name":"University of California at Irvine, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Saeed","family":"Maleki","sequence":"additional","affiliation":[{"name":"Microsoft, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"David","family":"Padua","sequence":"additional","affiliation":[{"name":"University of Illinois at Urbana-Champaign, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Alexander","family":"Veidenbaum","sequence":"additional","affiliation":[{"name":"University of California at Irvine, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Alexandru","family":"Nicolau","sequence":"additional","affiliation":[{"name":"University of California at Irvine, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Josep","family":"Torrellas","sequence":"additional","affiliation":[{"name":"University of Illinois at Urbana-Champaign, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2018,10,24]]},"reference":[{"key":"e_1_2_2_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/MIXDES.2016.7529786"},{"key":"e_1_2_2_2_1","unstructured":"Randy Allen and Ken Kennedy. 2001. Optimizing compilers for modern architectures a dependence-based approach. (2001).  Randy Allen and Ken Kennedy. 2001. Optimizing compilers for modern architectures a dependence-based approach. (2001)."},{"key":"e_1_2_2_3_1","doi-asserted-by":"publisher","DOI":"10.1177\/109434209100500306"},{"key":"e_1_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.5555\/1788374.1788386"},{"key":"e_1_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.1045\/september95-browne"},{"key":"e_1_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/2724717"},{"key":"e_1_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/IISWC.2017.8167779"},{"key":"e_1_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/3133917"},{"key":"e_1_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/101363.101366"},{"key":"e_1_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.micpro.2009.02.010"},{"key":"e_1_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1007\/11596110_24"},{"key":"e_1_2_2_14_1","unstructured":"GAP. 2007. GAP - Groups Algorithms Programming - a System for Computational Discrete Algebra. www.gap-system.org .  GAP. 2007. GAP - Groups Algorithms Programming - a System for Computational Discrete Algebra. www.gap-system.org ."},{"key":"e_1_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/2.869367"},{"key":"e_1_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/1186736.1186737"},{"key":"e_1_2_2_17_1","unstructured":"Intel. 2016. Intel 64 and IA32 architectures software developer\u2019s manual vol. 3A: system programming guide part 1. Intel Corporation Denver CO (2016).  Intel. 2016. Intel 64 and IA32 architectures software developer\u2019s manual vol. 3A: system programming guide part 1. Intel Corporation Denver CO (2016)."},{"key":"e_1_2_2_18_1","volume-title":"Fourth International Computer Software and Applications Conference. IEEE, 201\u2013218","author":"Kuck David J.","year":"1980","unstructured":"David J. Kuck , Robert H. Kuhn , Bruce Leasure , and Michael Wolfe . 1980 . The structure of an advanced vectorizer for pipelined processors . In Fourth International Computer Software and Applications Conference. IEEE, 201\u2013218 . David J. Kuck, Robert H. Kuhn, Bruce Leasure, and Michael Wolfe. 1980. The structure of an advanced vectorizer for pipelined processors. In Fourth International Computer Software and Applications Conference. IEEE, 201\u2013218."},{"key":"e_1_2_2_19_1","unstructured":"LAME. 2017. LAME MP3 Encoder. lame.sourceforge.net .  LAME. 2017. LAME MP3 Encoder. lame.sourceforge.net ."},{"key":"e_1_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/358438.349320"},{"key":"e_1_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/2594291.2594334"},{"key":"e_1_2_2_22_1","volume-title":"IEEE International Proceedings of the IEEE Workload Characterization Symposium (IISWC). 34\u201345","author":"Li Man-Lap","unstructured":"Man-Lap Li , R. Sasanka , S. V. Adve , Yen-Kuang Chen , and E. Debes . 2005. The ALPBench benchmark suite for complex multimedia applications . In IEEE International Proceedings of the IEEE Workload Characterization Symposium (IISWC). 34\u201345 . Man-Lap Li, R. Sasanka, S. V. Adve, Yen-Kuang Chen, and E. Debes. 2005. The ALPBench benchmark suite for complex multimedia applications. In IEEE International Proceedings of the IEEE Workload Characterization Symposium (IISWC). 34\u201345."},{"key":"e_1_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-13374-9_21"},{"key":"e_1_2_2_24_1","unstructured":"LLNL. 2008. ASC Sequoia Benchmark. https:\/\/asc.llnl.gov\/sequoia\/benchmarks\/ .  LLNL. 2008. ASC Sequoia Benchmark. https:\/\/asc.llnl.gov\/sequoia\/benchmarks\/ ."},{"key":"e_1_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/PACT.2011.68"},{"key":"e_1_2_2_26_1","unstructured":"Mozilla. 2017. Mozilla JPEG Encoder Project. github.com\/mozilla\/mozjpeg .  Mozilla. 2017. Mozilla JPEG Encoder Project. github.com\/mozilla\/mozjpeg ."},{"key":"e_1_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/1508284.1508275"},{"key":"e_1_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/1454115.1454119"},{"key":"e_1_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/1152154.1152182"},{"key":"e_1_2_2_30_1","volume-title":"How to benchmark code execution times on Intel IA-32 and IA-64 instruction set architectures","author":"Paoloni Gabriele","year":"2010","unstructured":"Gabriele Paoloni . 2010. How to benchmark code execution times on Intel IA-32 and IA-64 instruction set architectures . Intel Corporation ( 2010 ), 123. Gabriele Paoloni. 2010. How to benchmark code execution times on Intel IA-32 and IA-64 instruction set architectures. Intel Corporation (2010), 123."},{"key":"e_1_2_2_31_1","volume-title":"International journal of parallel programming 41, 5","author":"Park Eunjung","year":"2013","unstructured":"Eunjung Park , John Cavazos , Louis-No\u00ebl Pouchet , C\u00e9dric Bastoul , Albert Cohen , and P Sadayappan . 2013. Predictive modeling in a polyhedral optimization space . International journal of parallel programming 41, 5 ( 2013 ), 704\u2013750. Eunjung Park, John Cavazos, Louis-No\u00ebl Pouchet, C\u00e9dric Bastoul, Albert Cohen, and P Sadayappan. 2013. Predictive modeling in a polyhedral optimization space. International journal of parallel programming 41, 5 (2013), 704\u2013750."},{"key":"e_1_2_2_32_1","unstructured":"Tim Peters. 1992. Livermore loops coded in C. http:\/\/www.netlib.org\/benchmark\/livermorec . (1992).  Tim Peters. 1992. Livermore loops coded in C. http:\/\/www.netlib.org\/benchmark\/livermorec . (1992)."},{"key":"e_1_2_2_33_1","unstructured":"Louis-No\u00ebl Pouchet. 2011. Polyopt\/C: A polyhedral optimizer for the ROSE compiler. http:\/\/web.cse.ohio-state.edu\/~pouchet\/ software\/polyopt .  Louis-No\u00ebl Pouchet. 2011. Polyopt\/C: A polyhedral optimizer for the ROSE compiler. http:\/\/web.cse.ohio-state.edu\/~pouchet\/ software\/polyopt ."},{"key":"e_1_2_2_34_1","volume-title":"Polybench: The polyhedral benchmark suite","author":"Pouchet Louis-No\u00ebl","year":"2012","unstructured":"Louis-No\u00ebl Pouchet . 2012 . Polybench: The polyhedral benchmark suite . http:\/\/www.cs.ucla.edu\/pouchet\/software\/polybench . (2012). Louis-No\u00ebl Pouchet. 2012. Polybench: The polyhedral benchmark suite. http:\/\/www.cs.ucla.edu\/pouchet\/software\/polybench . (2012)."},{"key":"e_1_2_2_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/1379022.1375594"},{"key":"e_1_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.1142\/S0129626400000214"},{"key":"e_1_2_2_37_1","unstructured":"Joseph Redmon. 2013\u20132016. Darknet: Open Source Neural Networks in C. http:\/\/pjreddie.com\/darknet\/ .  Joseph Redmon. 2013\u20132016. Darknet: Open Source Neural Networks in C. http:\/\/pjreddie.com\/darknet\/ ."},{"key":"e_1_2_2_38_1","unstructured":"Peter Rundberg and Fredrik Warg. 2002. The FreeBench v1.0 Benchmark Suite. http:\/\/www.freebench.org . (2002).  Peter Rundberg and Fredrik Warg. 2002. The FreeBench v1.0 Benchmark Suite. http:\/\/www.freebench.org . (2002)."},{"key":"e_1_2_2_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE-NIER.2017.16"},{"key":"e_1_2_2_40_1","volume-title":"CortexSuite: A Synthetic Brain Benchmark Suite. In IEEE International Proceedings of the IEEE Workload Characterization Symposium (IISWC). 76\u201379","author":"Thomas Shelby","year":"2014","unstructured":"Shelby Thomas , Chetan Gohkale , Enrico Tanuwidjaja , Tony Chong , David Lau , Saturnino Garcia , and Michael Bedford Taylor . 2014 . CortexSuite: A Synthetic Brain Benchmark Suite. In IEEE International Proceedings of the IEEE Workload Characterization Symposium (IISWC). 76\u201379 . Shelby Thomas, Chetan Gohkale, Enrico Tanuwidjaja, Tony Chong, David Lau, Saturnino Garcia, and Michael Bedford Taylor. 2014. CortexSuite: A Synthetic Brain Benchmark Suite. In IEEE International Proceedings of the IEEE Workload Characterization Symposium (IISWC). 76\u201379."},{"key":"e_1_2_2_41_1","doi-asserted-by":"publisher","DOI":"10.1177\/1094342011414744"},{"key":"e_1_2_2_42_1","volume-title":"Proceedings of the International Symposium on Code Generation and Optimization: Feedback-directed and Runtime Optimization (CGO)). 204\u2013215","author":"Triantafyllis Spyridon","unstructured":"Spyridon Triantafyllis , Manish Vachharajani , Neil Vachharajani , and David I. August . 2003. Compiler Optimization-space Exploration . In Proceedings of the International Symposium on Code Generation and Optimization: Feedback-directed and Runtime Optimization (CGO)). 204\u2013215 . Spyridon Triantafyllis, Manish Vachharajani, Neil Vachharajani, and David I. August. 2003. Compiler Optimization-space Exploration. In Proceedings of the International Symposium on Code Generation and Optimization: Feedback-directed and Runtime Optimization (CGO)). 204\u2013215."},{"key":"e_1_2_2_43_1","unstructured":"TwoLAME. 2017. TwoLAME - MPEG Audio Layer 2 Encoder. www.twolame.org .  TwoLAME. 2017. TwoLAME - MPEG Audio Layer 2 Encoder. www.twolame.org ."},{"key":"e_1_2_2_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPEC.2014.7040972"},{"key":"e_1_2_2_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/3147.3165"},{"key":"e_1_2_2_46_1","volume-title":"Codecs from Xiph","unstructured":"xiph.org. 2017. Codecs from Xiph . Org Foundation . https:\/\/www.xiph.org . xiph.org. 2017. Codecs from Xiph.Org Foundation. https:\/\/www.xiph.org ."}],"container-title":["Proceedings of the ACM on Programming Languages"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3276496","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3276496","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3276496","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T01:01:59Z","timestamp":1750208519000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3276496"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,10,24]]},"references-count":44,"journal-issue":{"issue":"OOPSLA","published-print":{"date-parts":[[2018,10,24]]}},"alternative-id":["10.1145\/3276496"],"URL":"https:\/\/doi.org\/10.1145\/3276496","relation":{},"ISSN":["2475-1421"],"issn-type":[{"value":"2475-1421","type":"electronic"}],"subject":[],"published":{"date-parts":[[2018,10,24]]},"assertion":[{"value":"2018-10-24","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}