{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,4]],"date-time":"2025-11-04T16:10:49Z","timestamp":1762272649570,"version":"3.41.0"},"reference-count":48,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2020,9,30]],"date-time":"2020-09-30T00:00:00Z","timestamp":1601424000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2020,12,31]]},"abstract":"<jats:p>Polyhedral auto-transformation frameworks are known to find efficient loop transformations that maximize locality and parallelism and minimize synchronization. While complex loop transformations are routinely modeled in these frameworks, they tend to rely on ad hoc heuristics for loop fusion. Although there exist multiple loop fusion models with cost functions to maximize locality and parallelism, these models involve separate optimization steps rather than seamlessly integrating with other loop transformations like loop permutation, scaling, and shifting. Incorporating parallelism-preserving loop fusion heuristics into existing affine transformation frameworks like Pluto, LLVM-Polly, PPCG, and PoCC requires solving a large number of Integer Linear Programming formulations, which increase auto-transformation times significantly.<\/jats:p>\n          <jats:p>\n            In this work, we incorporate polynomial time loop fusion heuristics into the\n            <jats:italic>Pluto-lp-dfp<\/jats:italic>\n            framework. We present a data structure called the\n            <jats:italic>fusion conflict graph<\/jats:italic>\n            (FCG), which enables us to efficiently model loop fusion in the presence of other affine loop transformations. We propose a clustering heuristic to group the vertices of the FCG, which further enables us to provide three different polynomial time greedy fusion heuristics, namely,\n            <jats:italic>maximal fusion<\/jats:italic>\n            ,\n            <jats:italic>typed fusion<\/jats:italic>\n            , and\n            <jats:italic>hybrid fusion<\/jats:italic>\n            , while maintaining the compile time improvements of Pluto-lp-dfp over Pluto. Our experiments reveal that the hybrid fusion model, in conjunction with Pluto\u2019s cost function, finds efficient transformations that outperform PoCC and Pluto by mean factors of 1.8\u00d7 and 1.07\u00d7, respectively, with a maximum performance improvement of 14\u00d7 over PoCC and 2.6\u00d7 over Pluto.\n          <\/jats:p>","DOI":"10.1145\/3416510","type":"journal-article","created":{"date-parts":[[2020,9,30]],"date-time":"2020-09-30T11:23:50Z","timestamp":1601465030000},"page":"1-26","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":9,"title":["Effective Loop Fusion in Polyhedral Compilation Using Fusion Conflict Graphs"],"prefix":"10.1145","volume":"17","author":[{"given":"Aravind","family":"Acharya","sequence":"first","affiliation":[{"name":"Indian Institute of Science, India"}]},{"given":"Uday","family":"Bondhugula","sequence":"additional","affiliation":[{"name":"Indian Institute of Science, India"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8866-5343","authenticated-orcid":false,"given":"Albert","family":"Cohen","sequence":"additional","affiliation":[{"name":"Google, France"}]}],"member":"320","published-online":{"date-parts":[[2020,9,30]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/3192366.3192401"},{"key":"e_1_2_1_2_1","doi-asserted-by":"crossref","unstructured":"Vinayaka Bandishti Irshad Pananilath and Uday Bondhugula. 2012. Tiling stencil computations to maximize parallelism. In Supercomputing. Article 40 11 pages.  Vinayaka Bandishti Irshad Pananilath and Uday Bondhugula. 2012. Tiling stencil computations to maximize parallelism. In Supercomputing. Article 40 11 pages.","DOI":"10.1109\/SC.2012.107"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/2896389"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2016.2615094"},{"volume-title":"Proceedings of the Joint European Conferences on Theory and Practice of Software 17th International Conference on Compiler Construction (CC\u201908\/ETAPS\u201908)","author":"Bondhugula Uday","key":"e_1_2_1_5_1","unstructured":"Uday Bondhugula , Muthu Baskaran , Sriram Krishnamoorthy , J. Ramanujam , Atanas Rountev , and P. Sadayappan . 2008. Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model . In Proceedings of the Joint European Conferences on Theory and Practice of Software 17th International Conference on Compiler Construction (CC\u201908\/ETAPS\u201908) . 132--146. Uday Bondhugula, Muthu Baskaran, Sriram Krishnamoorthy, J. Ramanujam, Atanas Rountev, and P. Sadayappan. 2008. Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model. In Proceedings of the Joint European Conferences on Theory and Practice of Software 17th International Conference on Compiler Construction (CC\u201908\/ETAPS\u201908). 132--146."},{"volume-title":"Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI\u201908)","author":"Bondhugula Uday","key":"e_1_2_1_6_1","unstructured":"Uday Bondhugula , Albert Hartono , J. Ramanujam , and P. Sadayappan . 2008. A practical automatic polyhedral parallelizer and locality optimizer . In Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI\u201908) . 101--113. Uday Bondhugula, Albert Hartono, J. Ramanujam, and P. Sadayappan. 2008. A practical automatic polyhedral parallelizer and locality optimizer. In Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI\u201908). 101--113."},{"key":"e_1_2_1_7_1","unstructured":"Cloog 2004. The Chunky Loop Generator. Retrieved from http:\/\/www.cloog.org.  Cloog 2004. The Chunky Loop Generator. Retrieved from http:\/\/www.cloog.org."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/PACT.1999.807510"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0167-8191(00)00034-X"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1051\/ro\/1988220302431"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF01407835"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF01379404"},{"key":"e_1_2_1_13_1","unstructured":"GNU. [n.d.]. GLPK (GNU Linear Programming Kit). Retrieved from https:\/\/www.gnu.org\/software\/glpk\/.  GNU. [n.d.]. GLPK (GNU Linear Programming Kit). Retrieved from https:\/\/www.gnu.org\/software\/glpk\/."},{"key":"e_1_2_1_14_1","volume-title":"Proceedings of the 1st International Workshop on Polyhedral Compilation Techniques (IMPACT\u201911)","author":"Grosser Tobias","year":"2011","unstructured":"Tobias Grosser , Hongbin Zheng , Ragesh Aloor , Andreas Simburger , Armin Groslinger , and Louis-No\u00ebl Pouchet . 2011 . Polly\u2014polyhedral optimization in LLVM . In Proceedings of the 1st International Workshop on Polyhedral Compilation Techniques (IMPACT\u201911) . Tobias Grosser, Hongbin Zheng, Ragesh Aloor, Andreas Simburger, Armin Groslinger, and Louis-No\u00ebl Pouchet. 2011. Polly\u2014polyhedral optimization in LLVM. In Proceedings of the 1st International Workshop on Polyhedral Compilation Techniques (IMPACT\u201911)."},{"key":"e_1_2_1_15_1","volume-title":"Gurobi Optimization","author":"Inc.","year":"2016","unstructured":"Inc. Gurobi Optimization . 2016 . Gurobi Optimizer Reference Manual. Retrieved from http:\/\/www.gurobi.com. Inc. Gurobi Optimization. 2016. Gurobi Optimizer Reference Manual. Retrieved from http:\/\/www.gurobi.com."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/3178487.3178507"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/335231.335244"},{"volume-title":"Proceedings of the ACM International Conference on Supercomputing. 323--334","author":"Kennedy K.","key":"e_1_2_1_18_1","unstructured":"K. Kennedy and K. McKinley . 1992. Optimizing for parallelism and data locality . In Proceedings of the ACM International Conference on Supercomputing. 323--334 . K. Kennedy and K. McKinley. 1992. Optimizing for parallelism and data locality. In Proceedings of the ACM International Conference on Supercomputing. 323--334."},{"volume-title":"Languages and Compilers for Parallel Computing","author":"Kennedy K.","key":"e_1_2_1_19_1","unstructured":"K. Kennedy and K. McKinley . 1993. Maximizing loop parallelism and improving data locality via loop fusion and distribution . In Languages and Compilers for Parallel Computing . Springer , 301--320. K. Kennedy and K. McKinley. 1993. Maximizing loop parallelism and improving data locality via loop fusion and distribution. In Languages and Compilers for Parallel Computing. Springer, 301--320."},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/3314221.3314653"},{"volume-title":"Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI\u201913)","author":"Kong Martin","key":"e_1_2_1_21_1","unstructured":"Martin Kong , Richard Veras , Kevin Stock , Franz Franchetti , Louis-No\u00ebl Pouchet , and P. Sadayappan . 2013. When polyhedral transformations meet SIMD code generation . In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI\u201913) . 127--138. Martin Kong, Richard Veras, Kevin Stock, Franz Franchetti, Louis-No\u00ebl Pouchet, and P. Sadayappan. 2013. When polyhedral transformations meet SIMD code generation. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI\u201913). 127--138."},{"volume-title":"Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI\u201907)","author":"Krishnamoorthy Sriram","key":"e_1_2_1_22_1","unstructured":"Sriram Krishnamoorthy , Muthu Baskaran , Uday Bondhugula , J. Ramanujam , Atanas Rountev , and P. Sadayappan . 2007. Effective automatic parallelization of stencil computations . In Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI\u201907) . 235--244. Sriram Krishnamoorthy, Muthu Baskaran, Uday Bondhugula, J. Ramanujam, Atanas Rountev, and P. Sadayappan. 2007. Effective automatic parallelization of stencil computations. In Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI\u201907). 235--244."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0167-8191(98)00021-0"},{"volume-title":"Proceedings of the 13th International Conference on Supercomputing (ICS\u201999)","author":"Lim Amy W.","key":"e_1_2_1_24_1","unstructured":"Amy W. Lim , Gerald I. Cheong , and Monica S. Lam . 1999. An affine partitioning algorithm to maximize parallelism and minimize communication . In Proceedings of the 13th International Conference on Supercomputing (ICS\u201999) . 228--237. Amy W. Lim, Gerald I. Cheong, and Monica S. Lam. 1999. An affine partitioning algorithm to maximize parallelism and minimize communication. In Proceedings of the 13th International Conference on Supercomputing (ICS\u201999). 228--237."},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/258492.258520"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/2555243.2555250"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/2737924.2737954"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/2963101"},{"volume-title":"Encyclopedia of Parallel Computing","author":"Meister Beno\u00eet","key":"e_1_2_1_29_1","unstructured":"Beno\u00eet Meister , Nicolas Vasilache , David Wohlford , Muthu Baskaran , Allen Leung , and Richard Lethin . 2011. R-stream compiler . In Encyclopedia of Parallel Computing . Springer , 1756--1765. Beno\u00eet Meister, Nicolas Vasilache, David Wohlford, Muthu Baskaran, Allen Leung, and Richard Lethin. 2011. R-stream compiler. In Encyclopedia of Parallel Computing. Springer, 1756--1765."},{"key":"e_1_2_1_30_1","unstructured":"MLIR 2019. Multi-Level Intermediate Representation Compiler Infrastructure. Retrieved from https:\/\/mlir.llvm.org.  MLIR 2019. Multi-Level Intermediate Representation Compiler Infrastructure. Retrieved from https:\/\/mlir.llvm.org."},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/2897824.2925952"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/2694344.2694364"},{"key":"e_1_2_1_33_1","unstructured":"Pluto 2008. PLUTO: A Polyhedral Automatic Parallelizer and Locality Optimizer for Multicores. Retrieved from https:\/\/github.com\/bondhugula\/pluto.  Pluto 2008. PLUTO: A Polyhedral Automatic Parallelizer and Locality Optimizer for Multicores. Retrieved from https:\/\/github.com\/bondhugula\/pluto."},{"key":"e_1_2_1_34_1","unstructured":"PoCC 2019. The Polyhedral Compiler Collection. Obtained via private communication.  PoCC 2019. The Polyhedral Compiler Collection. Obtained via private communication."},{"key":"e_1_2_1_35_1","unstructured":"Polybench 2010. Polybench suite. Retrieved from http:\/\/polybench.sourceforge.net.  Polybench 2010. Polybench suite. Retrieved from http:\/\/polybench.sourceforge.net."},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/1926385.1926449"},{"key":"e_1_2_1_37_1","unstructured":"PPCG 2013. Polyhedral Parallel Code Generator. Retrieved from https:\/\/github.com\/Meinersbur\/ppcg.  PPCG 2013. Polyhedral Parallel Code Generator. Retrieved from https:\/\/github.com\/Meinersbur\/ppcg."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/2491956.2462176"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1147\/rd.413.0233"},{"volume-title":"Proceedings of the 5th International Conference on Supercomputing (ICS\u201991)","author":"Sarkar Vivek","key":"e_1_2_1_40_1","unstructured":"Vivek Sarkar and Guang R. Gao . 1991. Optimization of array accesses by collective loop transformations . In Proceedings of the 5th International Conference on Supercomputing (ICS\u201991) . 194--205. Vivek Sarkar and Guang R. Gao. 1991. Optimization of array accesses by collective loop transformations. In Proceedings of the 5th International Conference on Supercomputing (ICS\u201991). 194--205."},{"volume-title":"Theory of Linear and Integer Programming","author":"Schrijver Alexander","key":"e_1_2_1_41_1","unstructured":"Alexander Schrijver . 1986. Theory of Linear and Integer Programming . John Wiley 8 Sons. Alexander Schrijver. 1986. Theory of Linear and Integer Programming. John Wiley 8 Sons."},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2014.29"},{"key":"e_1_2_1_43_1","volume-title":"Proceedings of the International Workshop on Polyhedral Compilation Techniques (IMPACT\u201912)","author":"Vasilache Nicolas","year":"2012","unstructured":"Nicolas Vasilache , Benoit Meister , Muthu Baskaran , and Richard Lethin . 2012 . Joint scheduling and layout optimization to enable multi-level vectorization . In Proceedings of the International Workshop on Polyhedral Compilation Techniques (IMPACT\u201912) . Nicolas Vasilache, Benoit Meister, Muthu Baskaran, and Richard Lethin. 2012. Joint scheduling and layout optimization to enable multi-level vectorization. In Proceedings of the International Workshop on Polyhedral Compilation Techniques (IMPACT\u201912)."},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.5555\/1888390.1888455"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/2400682.2400713"},{"key":"e_1_2_1_46_1","volume-title":"Proceedings of the International Workshop on Polyhedral Compilation Techniques (IMPACT\u201916)","author":"Verdoolaege Sven","year":"2016","unstructured":"Sven Verdoolaege and Albert Cohen . 2016 . Live-range reordering . In Proceedings of the International Workshop on Polyhedral Compilation Techniques (IMPACT\u201916) . Sven Verdoolaege and Albert Cohen. 2016. Live-range reordering. In Proceedings of the International Workshop on Polyhedral Compilation Techniques (IMPACT\u201916)."},{"key":"e_1_2_1_47_1","unstructured":"XLA 2017. Optimizing Compiler for Machine Learning. Retrieved from https:\/\/www.tensorflow.org\/xla.  XLA 2017. Optimizing Compiler for Machine Learning. Retrieved from https:\/\/www.tensorflow.org\/xla."},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/3178372.3179507"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3416510","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3416510","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:41:30Z","timestamp":1750200090000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3416510"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,9,30]]},"references-count":48,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2020,12,31]]}},"alternative-id":["10.1145\/3416510"],"URL":"https:\/\/doi.org\/10.1145\/3416510","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2020,9,30]]},"assertion":[{"value":"2020-02-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-07-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-09-30","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}