{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,1]],"date-time":"2026-04-01T14:36:50Z","timestamp":1775054210479,"version":"3.50.1"},"reference-count":49,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2012,6,1]],"date-time":"2012-06-01T00:00:00Z","timestamp":1338508800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Embed. Comput. Syst."],"published-print":{"date-parts":[[2012,6]]},"abstract":"<jats:p>Code generation in a compiler is commonly divided into several phases: instruction selection, scheduling, register allocation, spill code generation, and, in the case of clustered architectures, cluster assignment. These phases are interdependent; for instance, a decision in the instruction selection phase affects how an operation can be scheduled We examine the effect of this separation of phases on the quality of the generated code. To study this we have formulated optimal methods for code generation with integer linear programming; first for acyclic code and then we extend this method to modulo scheduling of loops. In our experiments we compare optimal modulo scheduling, where all phases are integrated, to modulo scheduling, where instruction selection and cluster assignment are done in a separate phase. The results show that, for an architecture with two clusters, the integrated method finds a better solution than the nonintegrated method for 27% of the instances.<\/jats:p>","DOI":"10.1145\/2180887.2180896","type":"journal-article","created":{"date-parts":[[2012,6,11]],"date-time":"2012-06-11T13:03:21Z","timestamp":1339419801000},"page":"1-24","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":11,"title":["Integrated Code Generation for Loops"],"prefix":"10.1145","volume":"11S","author":[{"given":"Mattias","family":"Eriksson","sequence":"first","affiliation":[{"name":"Link\u00f6ping University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Christoph","family":"Kessler","sequence":"additional","affiliation":[{"name":"Link\u00f6ping University"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2012,6]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/372202.372787"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/207110.207128"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1007\/11823285_48"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1007\/11823285_30"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0898-1221(97)00184-3"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/C-M.1981.220595"},{"key":"e_1_2_1_7_1","volume-title":"Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT\u201901)","author":"Codina J. M.","unstructured":"Codina , J. M. , S\u00e1nchez , J. , and Gonz\u00e1lez , A . 2001. A unified modulo scheduling and register allocation technique for clustered processors . In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT\u201901) . IEEE, 175--184. Codina, J. M., S\u00e1nchez, J., and Gonz\u00e1lez, A. 2001. A unified modulo scheduling and register allocation technique for clustered processors. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT\u201901). IEEE, 175--184."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.5555\/226099.2949459"},{"key":"e_1_2_1_9_1","volume-title":"Proceedings of the 6th Workshop on Compilers for Parallel Computers (CPC\u201996)","author":"Eisenbeis C.","unstructured":"Eisenbeis , C. and Sawaya , A . 1996. Optimal loop parallelization under register constraints . In Proceedings of the 6th Workshop on Compilers for Parallel Computers (CPC\u201996) . 245--259. Eisenbeis, C. and Sawaya, A. 1996. Optimal loop parallelization under register constraints. In Proceedings of the 6th Workshop on Compilers for Parallel Computers (CPC\u201996). 245--259."},{"key":"e_1_2_1_10_1","volume-title":"Link\u00f6ping Studies in Science and Technology Thesis No. 1393","author":"Eriksson M.","unstructured":"Eriksson , M. 2009. Integrated software pipelining. Licentiate degree thesis , Link\u00f6ping Studies in Science and Technology Thesis No. 1393 , Link\u00f6ping University , Sweden . Eriksson, M. 2009. Integrated software pipelining. Licentiate degree thesis, Link\u00f6ping Studies in Science and Technology Thesis No. 1393, Link\u00f6ping University, Sweden."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-92990-1_7"},{"key":"e_1_2_1_12_1","volume-title":"Proceedings of the 11th International Workshop on Software & Compilers for Embedded Systems (SCOPES&rsquo;\u201908)","author":"Eriksson M. V.","unstructured":"Eriksson , M. V. , Skoog , O. , and Kessler , C. W . 2008. Optimal vs. heuristic integrated code generation for clustered VLIW architectures . In Proceedings of the 11th International Workshop on Software & Compilers for Embedded Systems (SCOPES&rsquo;\u201908) . ACM, New York, 11--20. Eriksson, M. V., Skoog, O., and Kessler, C. W. 2008. Optimal vs. heuristic integrated code generation for clustered VLIW architectures. In Proceedings of the 11th International Workshop on Software & Compilers for Embedded Systems (SCOPES&rsquo;\u201908). ACM, New York, 11--20."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2005.17"},{"key":"e_1_2_1_15_1","volume-title":"Proceedings of the 5th International Symposium on High Performance Computer Architecture (HPCA\u201999)","author":"Fernandes M. M.","unstructured":"Fernandes , M. M. , Llosa , J. , and Topham , N . 1999. Distributed modulo scheduling . In Proceedings of the 5th International Symposium on High Performance Computer Architecture (HPCA\u201999) . IEEE, 130. Fernandes, M. M., Llosa, J., and Topham, N. 1999. Distributed modulo scheduling. In Proceedings of the 5th International Symposium on High Performance Computer Architecture (HPCA\u201999). IEEE, 130."},{"key":"e_1_2_1_16_1","volume-title":"Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA\u201902)","author":"Fimmel D.","unstructured":"Fimmel , D. and M\u00fcller , J . 2002. Optimal software pipelining with rational initiation interval . In Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA\u201902) . CSREA Press, 638--643. Fimmel, D. and M\u00fcller, J. 2002. Optimal software pipelining with rational initiation interval. In Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA\u201902). CSREA Press, 638--643."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/800046.801649"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/43.240074"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/277044.277184"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/155090.155115"},{"key":"e_1_2_1_21_1","volume-title":"Proceedings of the 7th International Symposium on High-Performance Computer Architecture (HPCA\u201901)","author":"Kailas K.","unstructured":"Kailas , K. , Ebcioglu , K. , and Agrawala , A . 2001. CARS: A new code generation framework for clustered ILP processors . In Proceedings of the 7th International Symposium on High-Performance Computer Architecture (HPCA\u201901) . IEEE, 133--143. Kailas, K., Ebcioglu, K., and Agrawala, A. 2001. CARS: A new code generation framework for clustered ILP processors. In Proceedings of the 7th International Symposium on High-Performance Computer Architecture (HPCA\u201901). IEEE, 133--143."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.5555\/646906.710512"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1002\/cpe.v18:11"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1002\/cpe.v19:18"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/960116.54022"},{"key":"e_1_2_1_26_1","volume-title":"Proceedings of the International Symposium on Microarchitecture. IEEE, 330--335","author":"Lee C.","unstructured":"Lee , C. , Potkonjak , M. , and Mangione-Smith , W. H . 1997. Mediabench: A tool for evaluating and synthesizing multimedia and communications systems . In Proceedings of the International Symposium on Microarchitecture. IEEE, 330--335 . Lee, C., Potkonjak, M., and Mangione-Smith, W. H. 1997. Mediabench: A tool for evaluating and synthesizing multimedia and communications systems. In Proceedings of the International Symposium on Microarchitecture. IEEE, 330--335."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.5555\/517554.825774"},{"key":"e_1_2_1_28_1","volume-title":"Proceedings of the 28th Annual International Symposium on Microarchitecture. IEEE, 350--360","author":"Llosa J.","unstructured":"Llosa , J. , Valero , M. , Ayguad\u00e9 , E. , and Gonz\u00e1lez , A . 1995. Hypernode reduction modulo scheduling . In Proceedings of the 28th Annual International Symposium on Microarchitecture. IEEE, 350--360 . Llosa, J., Valero, M., Ayguad\u00e9, E., and Gonz\u00e1lez, A. 1995. Hypernode reduction modulo scheduling. In Proceedings of the 28th Annual International Symposium on Microarchitecture. IEEE, 350--360."},{"key":"e_1_2_1_29_1","volume-title":"Proceedings of the Conference on Parallel Architectures and Compilation Techniques (PACT\u201996)","author":"Llosa J.","unstructured":"Llosa , J. , Gonzalez , A. , Ayguade , E. , and Valero , M . 1996. Swing modulo scheduling: A lifetime-sensitive approach . In Proceedings of the Conference on Parallel Architectures and Compilation Techniques (PACT\u201996) . IEEE, 80--86. Llosa, J., Gonzalez, A., Ayguade, E., and Valero, M. 1996. Swing modulo scheduling: A lifetime-sensitive approach. In Proceedings of the Conference on Parallel Architectures and Compilation Techniques (PACT\u201996). IEEE, 80--86."},{"key":"e_1_2_1_30_1","volume-title":"Proceedings of the Conference on Design, Automation and Test in Europe (DATE\u201904)","author":"Lorenz M.","unstructured":"Lorenz , M. and Marwedel , P . 2004. Phase coupled code generation for DSPs using a genetic algorithm . In Proceedings of the Conference on Design, Automation and Test in Europe (DATE\u201904) . IEEE, 1270--1275. Lorenz, M. and Marwedel, P. 2004. Phase coupled code generation for DSPs using a genetic algorithm. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE\u201904). IEEE, 1270--1275."},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-85958-1_7"},{"key":"e_1_2_1_32_1","volume-title":"Proceedings of the 16th International Conference on Compiler Construction. Springer, 126--140","author":"Nagarakatte S. G.","unstructured":"Nagarakatte , S. G. and Govindarajan , R . 2007. Register allocation and optimal spill code scheduling in software pipelined loops using 0-1 integer linear programming formulation . In Proceedings of the 16th International Conference on Compiler Construction. Springer, 126--140 . Nagarakatte, S. G. and Govindarajan, R. 2007. Register allocation and optimal spill code scheduling in software pipelined loops using 0-1 integer linear programming formulation. In Proceedings of the 16th International Conference on Compiler Construction. Springer, 126--140."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/977091.977155"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/158511.158519"},{"key":"e_1_2_1_35_1","volume-title":"Proceedings of the 31st Annual ACM\/IEEE International Symposium on Microarchitecture. IEEE, 103--114","author":"Nystrom E.","unstructured":"Nystrom , E. and Eichenberger , A. E . 1998. Effective cluster assignment for modulo scheduling . In Proceedings of the 31st Annual ACM\/IEEE International Symposium on Microarchitecture. IEEE, 103--114 . Nystrom, E. and Eichenberger, A. E. 1998. Effective cluster assignment for modulo scheduling. In Proceedings of the 31st Annual ACM\/IEEE International Symposium on Microarchitecture. IEEE, 103--114."},{"key":"e_1_2_1_36_1","volume-title":"Proceedings of the 31st Annual ACM\/IEEE International Symposium on Microarchitecture. IEEE, 308--315","author":"Ozer E.","unstructured":"Ozer , E. , Banerjia , S. , and Conte , T. M . 1998. Unified assign and schedule: A new approach to scheduling for clustered register file microarchitectures . In Proceedings of the 31st Annual ACM\/IEEE International Symposium on Microarchitecture. IEEE, 308--315 . Ozer, E., Banerjia, S., and Conte, T. M. 1998. Unified assign and schedule: A new approach to scheduling for clustered register file microarchitectures. In Proceedings of the 31st Annual ACM\/IEEE International Symposium on Microarchitecture. IEEE, 308--315."},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/1140389.1140395"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/192724.192731"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/1014192.802449"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/231379.231385"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/315253.314427"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1137\/0202017"},{"key":"e_1_2_1_43_1","unstructured":"Texas Instruments Incorporated. 2000. TMS320C6000 CPU and Instruction Set Reference Guide. Texas Instruments Incorporated . 2000. TMS320C6000 CPU and Instruction Set Reference Guide ."},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2007.70752"},{"key":"e_1_2_1_45_1","unstructured":"Touati S.-A.-A. 2009. Data dependence graphs from Spec Mediabench and Ffmpeg benchmark suites. Personal communication. Touati S.-A.-A. 2009. Data dependence graphs from Spec Mediabench and Ffmpeg benchmark suites. Personal communication."},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.5555\/144953.145797"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/349299.349318"},{"key":"e_1_2_1_48_1","volume-title":"Proceedings of the International Conference on Computer Design (ICCD\u201994)","author":"Wilson T. C.","unstructured":"Wilson , T. C. , Grewal , G. W. , and Banerji , D. K . 1994. An ILP solution for simultaneous scheduling, allocation, and binding in multiple block synthesis . In Proceedings of the International Conference on Computer Design (ICCD\u201994) . IEEE, 581--586. Wilson, T. C., Grewal, G. W., and Banerji, D. K. 1994. An ILP solution for simultaneous scheduling, allocation, and binding in multiple block synthesis. In Proceedings of the International Conference on Computer Design (ICCD\u201994). IEEE, 581--586."},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.5555\/1331699.1331707"},{"key":"e_1_2_1_51_1","volume-title":"Proceedings of the IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD\u201902)","author":"Yang H.","unstructured":"Yang , H. , Govindarajan , R. , Gao , G. R. , and Theobald , K. B . 2002. Power-performance trade-offs for energy-efficient architectures: A quantitative study . In Proceedings of the IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD\u201902) . IEEE, 174. Yang, H., Govindarajan, R., Gao, G. R., and Theobald, K. B. 2002. Power-performance trade-offs for energy-efficient architectures: A quantitative study. In Proceedings of the IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD\u201902). IEEE, 174."}],"container-title":["ACM Transactions on Embedded Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2180887.2180896","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2180887.2180896","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T09:54:22Z","timestamp":1750240462000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2180887.2180896"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,6]]},"references-count":49,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2012,6]]}},"alternative-id":["10.1145\/2180887.2180896"],"URL":"https:\/\/doi.org\/10.1145\/2180887.2180896","relation":{},"ISSN":["1539-9087","1558-3465"],"issn-type":[{"value":"1539-9087","type":"print"},{"value":"1558-3465","type":"electronic"}],"subject":[],"published":{"date-parts":[[2012,6]]},"assertion":[{"value":"2009-06-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2010-03-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2012-06-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}