{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:22:45Z","timestamp":1750306965076,"version":"3.41.0"},"reference-count":27,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2013,9,1]],"date-time":"2013-09-01T00:00:00Z","timestamp":1377993600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001659","name":"Deutsche Forschungsgemeinschaft","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100001659","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Collaborative Research Center 614"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Embed. Comput. Syst."],"published-print":{"date-parts":[[2013,9]]},"abstract":"<jats:p>The diversity of today's mobile applications requires embedded processor cores with a high resource efficiency, that means, the devices should provide a high performance at low area requirements and power consumption. The fine-grained parallelism supported by multiple functional units of VLIW architectures offers a high throughput at reasonable low clock frequencies compared to single-core RISC processors. To efficiently utilize the processor pipeline, common system architectures have to cope with data hazards due to data dependencies between consecutive operations. On the one hand, such hazards can be resolved by complex forwarding circuits (i.e., a pipeline bypass) which forward intermediate results to a subsequent instruction. On the other hand, the pipeline bypass can strongly affect or even dominate the total resource requirements and degrade the maximum clock frequency. In this work the CoreVA VLIW architecture is used for the development and the analysis of application-specific bypass configurations. It is shown that many paths of a comprehensive bypass system are rarely used and may not be required for certain applications. For this reason, several strategies have been implemented to enhance the efficiency of the total system by introducing application-specific bypass configurations. The configuration can be carried out statically by only implementing required paths or at runtime by dynamically reconfiguring the hardware. An algorithm is proposed which derives an optimized configuration by iteratively disabling single bypass paths. The adaptation of these application-specific bypass configurations allows for a reduction of the critical path by 26%. As a result, the execution time and energy requirements could be reduced by up to 21.5%. Using Dynamic Frequency Scaling (DFS) and dynamic deactivation\/reactivation of bypass paths allows for a runtime reconfiguration of the bypass system. This ensures the highest efficiency while processing varying applications.<\/jats:p>","DOI":"10.1145\/2514641.2514645","type":"journal-article","created":{"date-parts":[[2013,10,1]],"date-time":"2013-10-01T18:14:28Z","timestamp":1380651268000},"page":"1-25","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["A systematic approach for optimized bypass configurations for application-specific embedded processors"],"prefix":"10.1145","volume":"13","author":[{"given":"Thorsten","family":"Jungeblut","sequence":"first","affiliation":[{"name":"Bielefeld University, Germany"}]},{"given":"Boris","family":"H\u00fcbener","sequence":"additional","affiliation":[{"name":"Bielefeld University, Germany"}]},{"given":"Mario","family":"Porrmann","sequence":"additional","affiliation":[{"name":"University of Paderborn, Germany"}]},{"given":"Ulrich","family":"R\u00fcckert","sequence":"additional","affiliation":[{"name":"Bielefeld University, Germany"}]}],"member":"320","published-online":{"date-parts":[[2013,9,30]]},"reference":[{"volume-title":"Proceedings of the 28th Annual International Symposium on Microarchitecture (MICRO'95)","author":"Ahuja P. S.","key":"e_1_2_1_1_1","unstructured":"Ahuja , P. S. , Clark , D. W. , and Rogers , A . 1995. The performance impact of incomplete bypassing in processor pipelines . In Proceedings of the 28th Annual International Symposium on Microarchitecture (MICRO'95) . 36--45. Ahuja, P. S., Clark, D. W., and Rogers, A. 1995. The performance impact of incomplete bypassing in processor pipelines. In Proceedings of the 28th Annual International Symposium on Microarchitecture (MICRO'95). 36--45."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/MSPEC.1967.5217220"},{"volume-title":"Proceedings of the 8th Annual International Symposium on High-Performance Computer Architecture. 289--298","author":"Brown M. D.","key":"e_1_2_1_3_1","unstructured":"Brown , M. D. and Patt , Y. N . 2001. Using internal redundant representations and limited bypass to support pipelined adders and register files . In Proceedings of the 8th Annual International Symposium on High-Performance Computer Architecture. 289--298 . Brown, M. D. and Patt, Y. N. 2001. Using internal redundant representations and limited bypass to support pipelined adders and register files. In Proceedings of the 8th Annual International Symposium on High-Performance Computer Architecture. 289--298."},{"key":"e_1_2_1_4_1","doi-asserted-by":"crossref","unstructured":"Daemen J. and Rijmen V. 2002. The Design of Rijndael: AES--The Advanced Encryption Standard. Springer.   Daemen J. and Rijmen V. 2002. The Design of Rijndael: AES--The Advanced Encryption Standard. Springer.","DOI":"10.1007\/978-3-662-04722-4_1"},{"volume-title":"Proceedings of the International Embedded Systems Symposium (IESS'09)","author":"Dreesen R.","key":"e_1_2_1_5_1","unstructured":"Dreesen , R. , Jungeblut , T. , Thies , M. , Porrmann , M. , R\u00fcckert , U. , and Kastens , U . 2009. A synchronization method for register traces of pipelined processors . In Proceedings of the International Embedded Systems Symposium (IESS'09) . 207--217. Dreesen, R., Jungeblut, T., Thies, M., Porrmann, M., R\u00fcckert, U., and Kastens, U. 2009. A synchronization method for register traces of pipelined processors. In Proceedings of the International Embedded Systems Symposium (IESS'09). 207--217."},{"volume-title":"Proceedings of the 1st Open NESSIE Workshop.","author":"Ekdahl P.","key":"e_1_2_1_6_1","unstructured":"Ekdahl , P. and Johansson , T . 2000. SNOW-- A new stream cipher . In Proceedings of the 1st Open NESSIE Workshop. Ekdahl, P. and Johansson, T. 2000. SNOW-- A new stream cipher. In Proceedings of the 1st Open NESSIE Workshop."},{"volume-title":"Proceedings of the of IEEE International Conference on Application-Specific Systems, Architectures, and Processors (ASSAP'03)","author":"Fan K.","key":"e_1_2_1_7_1","unstructured":"Fan , K. , Clark , N. , Chu , M. , Manjunath , K. V. , Ravindran , R. , Smelyanskiy , M. , and Mahlke , S . 2003. Systematic register bypass customization for application-specific processors . In Proceedings of the of IEEE International Conference on Application-Specific Systems, Architectures, and Processors (ASSAP'03) . 64--74. Fan, K., Clark, N., Chu, M., Manjunath, K. V., Ravindran, R., Smelyanskiy, M., and Mahlke, S. 2003. Systematic register bypass customization for application-specific processors. In Proceedings of the of IEEE International Conference on Application-Specific Systems, Architectures, and Processors (ASSAP'03). 64--74."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/800046.801649"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/MSSC.2009.932941"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/MSSC.2009.932433"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/VLSID.2007.127"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/383082.383165"},{"volume-title":"Proceedings of the 3rd Workshop on Optimizations for DSP and Embedded Systems (ODES'05)","author":"Hussmann M.","key":"e_1_2_1_13_1","unstructured":"Hussmann , M. , Thies , M. , and Kastens , U . 2005. Parallelizing compilation through load-time scheduling for a superscalar processor family . In Proceedings of the 3rd Workshop on Optimizations for DSP and Embedded Systems (ODES'05) , held in conjunction with the 3rd IEEE\/ACM International Symposium on Code Generation and Optimization (CGO'05). Hussmann, M., Thies, M., and Kastens, U. 2005. Parallelizing compilation through load-time scheduling for a superscalar processor family. In Proceedings of the 3rd Workshop on Optimizations for DSP and Embedded Systems (ODES'05), held in conjunction with the 3rd IEEE\/ACM International Symposium on Code Generation and Optimization (CGO'05)."},{"volume-title":"Proceedings of the 2nd International ICST Conference on Mobile Lightweight Wireless Systems.","author":"Jungeblut T.","key":"e_1_2_1_14_1","unstructured":"Jungeblut , T. , Dreesen , R. , Porrmann , M. , Thies , M. , R\u00fcckert , U. , and Kastens , U . 2010a. A framework for the design space exploration of software-defined radio applications . In Proceedings of the 2nd International ICST Conference on Mobile Lightweight Wireless Systems. Jungeblut, T., Dreesen, R., Porrmann, M., Thies, M., R\u00fcckert, U., and Kastens, U. 2010a. A framework for the design space exploration of software-defined radio applications. In Proceedings of the 2nd International ICST Conference on Mobile Lightweight Wireless Systems."},{"volume-title":"Proceedings of the Electrical and Electronic Engineering for Communication Conference (EEEfCOM'09)","author":"Jungeblut T.","key":"e_1_2_1_15_1","unstructured":"Jungeblut , T. , Klassen , D. , Dreesen , R. , Porrmann , M. , Thies , M. , R\u00fcckert , U. , and Kastens , U . 2009. Design space exploration for next generation wireless technologies . In Proceedings of the Electrical and Electronic Engineering for Communication Conference (EEEfCOM'09) . Jungeblut, T., Klassen, D., Dreesen, R., Porrmann, M., Thies, M., R\u00fcckert, U., and Kastens, U. 2009. Design space exploration for next generation wireless technologies. In Proceedings of the Electrical and Electronic Engineering for Communication Conference (EEEfCOM'09)."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.5194\/ars-8-295-2010"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/NAS.2010.14"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/997163.997182"},{"volume-title":"Proceedings of the International Symposium on VLSI Design Automation and Test (VLSI-DAT'10)","author":"Lung C.","key":"e_1_2_1_19_1","unstructured":"Lung , C. , Hsiao , H. , Zeng , Z. , and Chang , S . 2010. LP-based multi-mode multi-corner clock skew optimization . In Proceedings of the International Symposium on VLSI Design Automation and Test (VLSI-DAT'10) . IEEE, 335--338. Lung, C., Hsiao, H., Zeng, Z., and Chang, S. 2010. LP-based multi-mode multi-corner clock skew optimization. In Proceedings of the International Symposium on VLSI Design Automation and Test (VLSI-DAT'10). IEEE, 335--338."},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/JRPROC.1961.287814"},{"key":"e_1_2_1_21_1","volume-title":"Advances in Parallel Computing","volume":"19","author":"Porrmann M.","unstructured":"Porrmann , M. , Hagemeyer , J. , Pohl , C. , Romoth , J. , and Strugholtz , M . 2010. RAPTOR -- A scalable platform for rapid prototyping and FPGA-based cluster computing. In Parallel Computing: From Multicores and GPU's to Petascale , Advances in Parallel Computing , vol. 19 , IOS Press, 592--599. Porrmann, M., Hagemeyer, J., Pohl, C., Romoth, J., and Strugholtz, M. 2010. RAPTOR -- A scalable platform for rapid prototyping and FPGA-based cluster computing. In Parallel Computing: From Multicores and GPU's to Petascale, Advances in Parallel Computing, vol. 19, IOS Press, 592--599."},{"volume-title":"The H.264 Advanced Video Compression Standard","author":"Richardson I.","key":"e_1_2_1_22_1","unstructured":"Richardson , I. 2010. The H.264 Advanced Video Compression Standard . John Wiley and Sons . Richardson, I. 2010. The H.264 Advanced Video Compression Standard. John Wiley and Sons."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/TVLSI.2002.801617"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICVD.2005.95"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIT.1967.1054010"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/358274.358283"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/TVLSI.2006.876105"}],"container-title":["ACM Transactions on Embedded Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2514641.2514645","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2514641.2514645","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T08:39:18Z","timestamp":1750235958000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2514641.2514645"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2013,9]]},"references-count":27,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2013,9]]}},"alternative-id":["10.1145\/2514641.2514645"],"URL":"https:\/\/doi.org\/10.1145\/2514641.2514645","relation":{},"ISSN":["1539-9087","1558-3465"],"issn-type":[{"type":"print","value":"1539-9087"},{"type":"electronic","value":"1558-3465"}],"subject":[],"published":{"date-parts":[[2013,9]]},"assertion":[{"value":"2011-02-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2012-05-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2013-09-30","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}