{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,5]],"date-time":"2025-10-05T04:14:56Z","timestamp":1759637696136,"version":"3.41.0"},"reference-count":26,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2009,9,1]],"date-time":"2009-09-01T00:00:00Z","timestamp":1251763200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["CCR-9988238"],"award-info":[{"award-number":["CCR-9988238"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2009,9]]},"abstract":"<jats:p>As technology has advanced, the application space of Very Long Instruction Word (VLIW) processors has grown to include a variety of embedded platforms. Due to cost and power consumption constraints, many embedded VLIW processors contain limited resources, including registers. As a result, a VLIW compiler that maximizes instruction level parallelism (ILP) without considering register constraints may generate excessive register spills, leading to reduced overall system performance. To address this issue, this article presents a new spill reduction technique that improves VLIW runtime performance by reordering operations prior to register allocation and instruction scheduling. Unlike earlier algorithms, our approach explicitly considers both register reduction and data dependency in performing operation reordering. Data dependency control limits unexpected schedule length increases during subsequent instruction scheduling. Our technique has been evaluated using Trimaran, an academic VLIW compiler, and evaluated using a set of embedded systems benchmarks. Experimental results show that, on average, this technique improves VLIW performance by 10% for VLIW processors with 32 registers and 8 functional units compared with previous spill reduction techniques. Limited improvement is seen versus prior approaches for VLIW processors with 64 registers and 8 functional units.<\/jats:p>","DOI":"10.1145\/1582710.1582713","type":"journal-article","created":{"date-parts":[[2009,10,6]],"date-time":"2009-10-06T18:18:59Z","timestamp":1254853139000},"page":"1-40","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Tetris-XL"],"prefix":"10.1145","volume":"6","author":[{"given":"Weifeng","family":"Xu","sequence":"first","affiliation":[{"name":"University of Massachusetts Amherst, Amherst, MA"}]},{"given":"Russell","family":"Tessier","sequence":"additional","affiliation":[{"name":"University of Massachusetts Amherst, Amherst, MA"}]}],"member":"320","published-online":{"date-parts":[[2009,10,2]]},"reference":[{"volume-title":"Proceedings of the IFIP Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism. Springer","author":"Berson D. A.","key":"e_1_2_1_1_1","unstructured":"Berson , D. A. , Gupta , R. , and Soffa , M. L . 1993. URSA: A unified resource allocator for registers and functional units in VLIW architectures . In Proceedings of the IFIP Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism. Springer , Berlin, 243--254. Berson, D. A., Gupta, R., and Soffa, M. L. 1993. URSA: A unified resource allocator for registers and functional units in VLIW architectures. In Proceedings of the IFIP Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism. Springer, Berlin, 243--254."},{"volume-title":"Proceedings of the International Workshop on Languages and Compilers for Parallel Computing. Springer","author":"Berson D. A.","key":"e_1_2_1_2_1","unstructured":"Berson , D. A. , Gupta , R. , and Soffa , M. L . 1998. Integrated instruction scheduling and register allocation techniques . In Proceedings of the International Workshop on Languages and Compilers for Parallel Computing. Springer , Berlin, 247--262. Berson, D. A., Gupta, R., and Soffa, M. L. 1998. Integrated instruction scheduling and register allocation techniques. In Proceedings of the International Workshop on Languages and Compilers for Parallel Computing. Springer, Berlin, 247--262."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/1254766.1254782"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/73141.74843"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/800230.806984"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1007\/11532378_4"},{"volume-title":"Proceedings of the Conference of the Advanced School for Computing and Imaging. 52--57","author":"Cilio A.","key":"e_1_2_1_8_1","unstructured":"Cilio , A. and Corp oraal, H . 1999. Global program optimization: Register allocation of static scalar objects . In Proceedings of the Conference of the Advanced School for Computing and Imaging. 52--57 . Cilio, A. and Corporaal, H. 1999. Global program optimization: Register allocation of static scalar objects. In Proceedings of the Conference of the Advanced School for Computing and Imaging. 52--57."},{"key":"e_1_2_1_9_1","unstructured":"Cormen T. H. Leiserson C. E. and Rivest R. L. 1990. Introduction to Algorithms. McGraw-Hill New York.   Cormen T. H. Leiserson C. E. and Rivest R. L. 1990. Introduction to Algorithms. McGraw-Hill New York."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.2307\/1969503"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/339647.339682"},{"key":"e_1_2_1_12_1","unstructured":"Freescale Semiconductor Inc. 2005. MSC8101 Reference Manual. Freescale Semiconductor Inc. http:\/\/www.datasheetcatalog.org\/datasheets2\/17\/1767447_1.pdf  Freescale Semiconductor Inc. 2005. MSC8101 Reference Manual. Freescale Semiconductor Inc. http:\/\/www.datasheetcatalog.org\/datasheets2\/17\/1767447_1.pdf"},{"volume-title":"Proceedings of the International Workshop on Code Generation. ACM","author":"Freudenberger S. M.","key":"e_1_2_1_13_1","unstructured":"Freudenberger , S. M. and Ruttenberg , J. C . 1991. Phase ordering of register allocation and instruction scheduling . In Proceedings of the International Workshop on Code Generation. ACM , New York, 146--172. Freudenberger, S. M. and Ruttenberg, J. C. 1991. Phase ordering of register allocation and instruction scheduling. In Proceedings of the International Workshop on Code Generation. ACM, New York, 146--172."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/55364.55407"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/5.558718"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2003.1159750"},{"key":"e_1_2_1_17_1","volume-title":"Computer Architecture: A Quantitative Approach. Morgan Kaufmann","author":"Hennessy J. L.","year":"1996","unstructured":"Hennessy , J. L. and Patterson , D. A . 1996 . Computer Architecture: A Quantitative Approach. Morgan Kaufmann , San Francisco, CA . Hennessy, J. L. and Patterson, D. A. 1996. Computer Architecture: A Quantitative Approach. Morgan Kaufmann, San Francisco, CA."},{"volume-title":"Proceedings of the International Symposium on Microarchitecture. ACM","author":"Lee C.","key":"e_1_2_1_19_1","unstructured":"Lee , C. , Potkonjak , M. , and Mangione-Smith , W. H . 1997. MediaBench: A tool for evaluating and synthesizing multimedia and communications systems . In Proceedings of the International Symposium on Microarchitecture. ACM , New York, 330--335. Lee, C., Potkonjak, M., and Mangione-Smith, W. H. 1997. MediaBench: A tool for evaluating and synthesizing multimedia and communications systems. In Proceedings of the International Symposium on Microarchitecture. ACM, New York, 330--335."},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/329166.329208"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/169627.169839"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/155090.155114"},{"key":"e_1_2_1_23_1","unstructured":"Texas Instruments Inc. 2000. TMS320C6000 CPU and Instruction Set Reference Guide. Texas Instruments Inc. http:\/\/focus.ti.com\/lit\/ug\/spru189g\/spru189g.pdf  Texas Instruments Inc. 2000. TMS320C6000 CPU and Instruction Set Reference Guide. Texas Instruments Inc. http:\/\/focus.ti.com\/lit\/ug\/spru189g\/spru189g.pdf"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.5555\/647477.727780"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10766-005-6466-x"},{"key":"e_1_2_1_26_1","unstructured":"Transmeta Inc. 2005. Transmeta Efficeon TM8820 Processor. Transmeta Inc. http:\/\/datasheets.chipdb.org\/Transmeta\/pdfs\/brochures\/tmta_efficeon_tm8820.pdf  Transmeta Inc. 2005. Transmeta Efficeon TM8820 Processor. Transmeta Inc. http:\/\/datasheets.chipdb.org\/Transmeta\/pdfs\/brochures\/tmta_efficeon_tm8820.pdf"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/1254766.1254783"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0165-1684(03)00089-6"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1582710.1582713","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1582710.1582713","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T13:30:08Z","timestamp":1750253408000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1582710.1582713"}},"subtitle":["A performance-driven spill reduction technique for embedded VLIW processors"],"short-title":[],"issued":{"date-parts":[[2009,9]]},"references-count":26,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2009,9]]}},"alternative-id":["10.1145\/1582710.1582713"],"URL":"https:\/\/doi.org\/10.1145\/1582710.1582713","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2009,9]]},"assertion":[{"value":"2007-09-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2009-03-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2009-10-02","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}