{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,14]],"date-time":"2026-05-14T07:16:57Z","timestamp":1778743017707,"version":"3.51.4"},"reference-count":42,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2009,3,30]],"date-time":"2009-03-30T00:00:00Z","timestamp":1238371200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2009,3,30]]},"abstract":"<jats:p>\n            Retargetable C compilers are currently widely used to quickly obtain compiler support for new embedded processors and to perform early processor architecture exploration. A partially inherent problem of the retargetable compilation approach, though, is the limited code quality as compared to hand-written compilers or assembly code due to the lack of dedicated optimizations techniques. This problem can be circumvented by designing flexible, retargetable code optimization techniques that apply to a certain range of target architectures. This article focuses on target machines with SIMD instruction support, a common feature in embedded processors for multimedia applications. However, SIMD optimization is known to be a difficult task since SIMD architectures are largely nonuniform, support only a limited set of data types and impose several memory alignment constraints. Additionally, such techniques require complicated loop transformations, which are tailored to the SIMD architecture in order to exhibit the necessary amount of parallelism in the code. Thus, integrating the SIMD optimization\n            <jats:italic>and<\/jats:italic>\n            the required loop transformations together in a single retargeting formalism is an ambitious challenge. In this article, we present an efficient and quickly retargetable SIMD code optimization framework that is integrated into an industrial retargetable C compiler. Experimental results for different processors demonstrate that the proposed technique applies to real-life target machines and that it produces code quality improvements close to the theoretical limit.\n          <\/jats:p>","DOI":"10.1145\/1509864.1509866","type":"journal-article","created":{"date-parts":[[2009,4,6]],"date-time":"2009-04-06T16:34:22Z","timestamp":1239035662000},"page":"1-27","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":17,"title":["A SIMD optimization framework for retargetable compilers"],"prefix":"10.1145","volume":"6","author":[{"given":"Manuel","family":"Hohenauer","sequence":"first","affiliation":[{"name":"RWTH Aachen University, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Felix","family":"Engel","sequence":"additional","affiliation":[{"name":"RWTH Aachen University, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Rainer","family":"Leupers","sequence":"additional","affiliation":[{"name":"RWTH Aachen University, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Gerd","family":"Ascheid","sequence":"additional","affiliation":[{"name":"RWTH Aachen University, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Heinrich","family":"Meyr","sequence":"additional","affiliation":[{"name":"RWTH Aachen University, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2009,4,2]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Associated Computer Experts (ACE). The COSY compiler development system. http:\/\/www.ace.nl.  Associated Computer Experts (ACE). The COSY compiler development system. http:\/\/www.ace.nl."},{"key":"e_1_2_1_2_1","unstructured":"Advanced RISC Machines Ltd. The ARM11 processor. http:\/\/www.arm.com.  Advanced RISC Machines Ltd. The ARM11 processor. http:\/\/www.arm.com."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/567067.567085"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/29873.29875"},{"key":"e_1_2_1_5_1","volume-title":"Proceedings of the 2nd SUIF Compiler Workshop","author":"Cheong G.","unstructured":"Cheong , G. and Lam , M. S . 1997. An optimizer for multimedia instruction sets . In Proceedings of the 2nd SUIF Compiler Workshop . Stanford University, CA. Cheong, G. and Lam, M. S. 1997. An optimizer for multimedia instruction sets. In Proceedings of the 2nd SUIF Compiler Workshop. Stanford University, CA."},{"key":"e_1_2_1_6_1","unstructured":"Coware Inc. Processor Designer. http:\/\/www.coware.com.  Coware Inc. Processor Designer. http:\/\/www.coware.com."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/996841.996853"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/309847.309923"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2004.840491"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/151640.151642"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/131080.131089"},{"key":"e_1_2_1_12_1","volume-title":"Proceedings of the 13th Annual IEEE International ASIC\/SOC Conference. IEEE","author":"Gl\u00f6ckler T.","unstructured":"Gl\u00f6ckler , T. , Bitterlich , S. , and Meyr , H . 2000. ICORE: a low-power application specific instruction set processor for DVB-T acquisition and tracking . In Proceedings of the 13th Annual IEEE International ASIC\/SOC Conference. IEEE , Los Alamitos, CA. Gl\u00f6ckler, T., Bitterlich, S., and Meyr, H. 2000. ICORE: a low-power application specific instruction set processor for DVB-T acquisition and tracking. In Proceedings of the 13th Annual IEEE International ASIC\/SOC Conference. IEEE, Los Alamitos, CA."},{"key":"e_1_2_1_13_1","unstructured":"GNU Compiler Collection. Auto-vectorization in GCC. http:\/\/gcc.gnu.org\/projects\/tree-ssa\/vectorization.html.  GNU Compiler Collection. Auto-vectorization in GCC. http:\/\/gcc.gnu.org\/projects\/tree-ssa\/vectorization.html."},{"key":"e_1_2_1_14_1","doi-asserted-by":"crossref","unstructured":"Gries M. and Keutzer K. 2005. Building ASIPs: The Mescal Methodology. Springer-Verlag Berlin Germany.   Gries M. and Keutzer K. 2005. Building ASIPs: The Mescal Methodology. Springer-Verlag Berlin Germany.","DOI":"10.1007\/b136892"},{"key":"e_1_2_1_15_1","volume-title":"Proceedings of the Conference on Design, Automation &amp; Test in Europe (DATE'01)","author":"Hoffmann A.","unstructured":"Hoffmann , A. , Kogel , T. , and Meyr , H . 2001. A framework for fast hardware-software co-simulation . In Proceedings of the Conference on Design, Automation &amp; Test in Europe (DATE'01) . IEEE, Los Alamitos, CA. Hoffmann, A., Kogel, T., and Meyr, H. 2001. A framework for fast hardware-software co-simulation. In Proceedings of the Conference on Design, Automation &amp; Test in Europe (DATE'01). IEEE, Los Alamitos, CA."},{"key":"e_1_2_1_16_1","doi-asserted-by":"crossref","unstructured":"Hoffmann A. Meyr H. and Leupers R. 2002. Architecture Exploration for Embedded Processors with LISA. Kluwer Academic Publishers The Netherlands   Hoffmann A. Meyr H. and Leupers R. 2002. Architecture Exploration for Embedded Processors with LISA. Kluwer Academic Publishers The Netherlands","DOI":"10.1007\/978-1-4757-4538-2"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.5555\/968879.969149"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/1176254.1176291"},{"key":"e_1_2_1_19_1","unstructured":"Intel Corporation. Intel C compiler. http:\/\/www.intel.com.  Intel Corporation. Intel C compiler. http:\/\/www.intel.com."},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/370155.370573"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1007507005174"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/1065910.1065931"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/349299.349320"},{"key":"e_1_2_1_24_1","volume-title":"Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT'02)","author":"Larsen S. W. E.","unstructured":"Larsen , S. W. E. and Amarasinghe , S. P . 2002. Increasing and detecting memory address congruence . In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT'02) . IEEE, Los Alamitos, CA, 18--29. Larsen, S. W. E. and Amarasinghe, S. P. 2002. Increasing and detecting memory address congruence. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT'02). IEEE, Los Alamitos, CA, 18--29."},{"key":"e_1_2_1_25_1","volume-title":"Code Optimization Techniques for Embedded Processors","author":"Leupers R.","unstructured":"Leupers , R. 2000a. Code Optimization Techniques for Embedded Processors . Kluwer Academic Publishers , The Netherlands . Leupers, R. 2000a. Code Optimization Techniques for Embedded Processors. Kluwer Academic Publishers, The Netherlands."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/343647.343679"},{"key":"e_1_2_1_27_1","doi-asserted-by":"crossref","unstructured":"Leupers R. and Marwedel P. 2001. Retargetable Compiler Technology for Embedded Systems. Kluwer Academic Publishers The Netherlands.   Leupers R. and Marwedel P. 2001. Retargetable Compiler Technology for Embedded Systems. Kluwer Academic Publishers The Netherlands.","DOI":"10.1007\/978-1-4757-6420-8"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/500001.500061"},{"key":"e_1_2_1_29_1","volume-title":"Advanced Compiler Design &amp","author":"Muchnick S. S.","unstructured":"Muchnick , S. S. 1997. Advanced Compiler Design &amp ; Implementation. Morgan Kaufmann Publishers , San Francisco, CA. Muchnick, S. S. 1997. Advanced Compiler Design &amp; Implementation. Morgan Kaufmann Publishers, San Francisco, CA."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2006.25"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/1133981.1133997"},{"key":"e_1_2_1_32_1","unstructured":"NXP Semiconductors. The TriMedia media processor. http:\/\/www.nxp.com.  NXP Semiconductors. The TriMedia media processor. http:\/\/www.nxp.com."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1023\/B:IJPP.0000004675.70367.00"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1002\/cpe.v16:2\/3"},{"key":"e_1_2_1_35_1","volume-title":"Proceedings of the 5th Workshop on Media and Streaming Processors. ACM","author":"Pryanishnikov I.","unstructured":"Pryanishnikov , I. , Krall , A. , and Horspool , N . 2003. Pointer alignment analysis for processors with SIMD instructions . In Proceedings of the 5th Workshop on Media and Streaming Processors. ACM , New York. Pryanishnikov, I., Krall, A., and Horspool, N. 2003. Pointer alignment analysis for processors with SIMD instructions. In Proceedings of the 5th Workshop on Media and Streaming Processors. ACM, New York."},{"key":"e_1_2_1_36_1","volume-title":"Proceedings of the 16th International Workshop of Languages and Compilers for Parallel Computing. Springer","author":"Ren G.","unstructured":"Ren , G. , Wu , P. , and Padua , D . 2003. A preliminary study on the vectorization of multimedia applications for multimedia extensions . In Proceedings of the 16th International Workshop of Languages and Compilers for Parallel Computing. Springer , Berlin, Germany. Ren, G., Wu, P., and Padua, D. 2003. A preliminary study on the vectorization of multimedia applications for multimedia extensions. In Proceedings of the 16th International Workshop of Languages and Compilers for Parallel Computing. Springer, Berlin, Germany."},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/1133981.1133996"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1007\/11532378_18"},{"key":"e_1_2_1_39_1","unstructured":"Tensilica Inc. Xtensa C compiler. http:\/\/www.tensilica.com.  Tensilica Inc. Xtensa C compiler. http:\/\/www.tensilica.com."},{"key":"e_1_2_1_40_1","volume-title":"High Performance Compilers for Parallel Computing","author":"Wolfe M. J.","unstructured":"Wolfe , M. J. 1995. High Performance Compilers for Parallel Computing . Addison-Wesley Longman , Boston, MA . Wolfe, M. J. 1995. High Performance Compilers for Parallel Computing. Addison-Wesley Longman, Boston, MA."},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2005.18"},{"key":"e_1_2_1_42_1","volume-title":"Proceedings of the International Conference on Signal Processing Applications and Technology (ICSPAT). IASTED, Calgary, Alberta.","author":"Zivojnovic V.","unstructured":"Zivojnovic , V. , Velarde , J. , Schl\u00e4ger , C. , and Meyr , H . 1994. DSPStone\u2014a DSP-oriented benchmarking methodology . In Proceedings of the International Conference on Signal Processing Applications and Technology (ICSPAT). IASTED, Calgary, Alberta. Zivojnovic, V., Velarde, J., Schl\u00e4ger, C., and Meyr, H. 1994. DSPStone\u2014a DSP-oriented benchmarking methodology. In Proceedings of the International Conference on Signal Processing Applications and Technology (ICSPAT). IASTED, Calgary, Alberta."}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1509864.1509866","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1509864.1509866","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T13:57:56Z","timestamp":1750255076000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1509864.1509866"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2009,3,30]]},"references-count":42,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2009,3,30]]}},"alternative-id":["10.1145\/1509864.1509866"],"URL":"https:\/\/doi.org\/10.1145\/1509864.1509866","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"value":"1544-3566","type":"print"},{"value":"1544-3973","type":"electronic"}],"subject":[],"published":{"date-parts":[[2009,3,30]]},"assertion":[{"value":"2008-01-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2008-11-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2009-04-02","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}