{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:20:08Z","timestamp":1750306808169,"version":"3.41.0"},"reference-count":31,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2013,12,1]],"date-time":"2013-12-01T00:00:00Z","timestamp":1385856000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100004085","name":"Ministry of Education, Science and Technology","doi-asserted-by":"publisher","award":["2010-0011534"],"award-info":[{"award-number":["2010-0011534"]}],"id":[{"id":"10.13039\/501100004085","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100003725","name":"National Research Foundation of Korea","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100003725","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100003621","name":"Ministry of Science, ICT and Future Planning","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100003621","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2013,12]]},"abstract":"<jats:p>\n            Control divergence poses many problems in parallelizing loops. While predicated execution is commonly used to convert control dependence into data dependence, it often incurs high overhead because it allocates resources equally for both branches of a conditional statement regardless of their execution frequencies. For those loops with\n            <jats:italic>unbalanced<\/jats:italic>\n            conditionals, we propose a software transformation that divides a loop into two or three smaller loops so that the condition is evaluated only in the first loop, while the less frequent branch is executed in the second loop in a way that is much more efficient than in the original loop. To reduce the overhead of extra data transfer caused by the loop fission, we also present a hardware extension for a class of Coarse-Grained Reconfigurable Architectures (CGRAs). Our experiments using MiBench and computer vision benchmarks on a CGRA demonstrate that our techniques can improve the performance of loops over predicated execution by up to 65% (37.5%, on average), when the hardware extension is enabled. Without any hardware modification, our software-only version can improve performance by up to 64% (33%, on average), while simultaneously reducing the energy consumption of the entire CGRA including configuration and data memory by 22%, on average.\n          <\/jats:p>","DOI":"10.1145\/2541228.2555317","type":"journal-article","created":{"date-parts":[[2014,1,14]],"date-time":"2014-01-14T13:39:57Z","timestamp":1389706797000},"page":"1-23","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["Evaluator-executor transformation for efficient pipelining of loops with conditionals"],"prefix":"10.1145","volume":"10","author":[{"given":"Yeonghun","family":"Jeong","sequence":"first","affiliation":[{"name":"LG Electronics*, Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Seongseok","family":"Seo","sequence":"additional","affiliation":[{"name":"UNIST, Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jongeun","family":"Lee","sequence":"additional","affiliation":[{"name":"UNIST, Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2013,12]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Compilers: Principles, Techniques, and Tools","author":"Aho A. V.","year":"2006","unstructured":"Aho , A. V. , Lam , M. S. , Sethi , R. , and Ullman , J. D . 2006 . Compilers: Principles, Techniques, and Tools , 2 nd ed. Addison-Wesley Longman , Boston, MA . Aho, A. V., Lam, M. S., Sethi, R., and Ullman, J. D. 2006. Compilers: Principles, Techniques, and Tools, 2nd ed. Addison-Wesley Longman, Boston, MA.","edition":"2"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/2.982917"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/2366231.2337166"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/43.62794"},{"key":"e_1_2_1_5_1","doi-asserted-by":"crossref","unstructured":"Cardoso J. M. P. and Diniz P. C. 2009. Compilation Techniques for Reconfigurable Architectures. Springer.   Cardoso J. M. P. and Diniz P. C. 2009. Compilation Techniques for Reconfigurable Architectures. Springer.","DOI":"10.1007\/978-0-387-09671-1"},{"volume-title":"Proceedings of the International Conference on Field-Programmable Technology (FPT\u201912)","author":"Chen L.","key":"e_1_2_1_6_1","unstructured":"Chen , L. and Mitra , T . 2012. Graph minor approach for application mapping on CGRAs . In Proceedings of the International Conference on Field-Programmable Technology (FPT\u201912) . 285--292. Chen, L. and Mitra, T. 2012. Graph minor approach for application mapping on CGRAs. In Proceedings of the International Conference on Field-Programmable Technology (FPT\u201912). 285--292."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/2155620.2155676"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/24039.24041"},{"volume-title":"Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture (HPCA\u201911)","author":"Fung W. W. L.","key":"e_1_2_1_9_1","unstructured":"Fung , W. W. L. and Aamodt , T. M . 2011. Thread block compaction for efficient SIMT control flow . In Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture (HPCA\u201911) . IEEE Computer Society, Washington, DC, 25--36. Fung, W. W. L. and Aamodt, T. M. 2011. Thread block compaction for efficient SIMT control flow. In Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture (HPCA\u201911). IEEE Computer Society, Washington, DC, 25--36."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/2155620.2155623"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/378239.378481"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.5555\/1128020.1128563"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/2228360.2228600"},{"volume-title":"Proceedings of the Conference on Design, Automation and Test in Europe (DATE\u201913)","author":"Han K.","key":"e_1_2_1_14_1","unstructured":"Han , K. , Choi , K. , and Lee , J . 2013. Compiling control-intensive loops for CGRAs with state-based full predication . In Proceedings of the Conference on Design, Automation and Test in Europe (DATE\u201913) . EDA Consortium, San Jose, CA, 1579--1582. Han, K., Choi, K., and Lee, J. 2013. Compiling control-intensive loops for CGRAs with state-based full predication. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE\u201913). EDA Consortium, San Jose, CA, 1579--1582."},{"volume-title":"Proceedings of the 2010 International Conference on Field-Programmable Technology (FPT\u201910)","author":"Han K.","key":"e_1_2_1_15_1","unstructured":"Han , K. , Paek , J. K. , and Choi , K . 2010. Acceleration of control flow on CGRA using advanced predicated execution . In Proceedings of the 2010 International Conference on Field-Programmable Technology (FPT\u201910) . 429--432. Han, K., Paek, J. K., and Choi, K. 2010. Acceleration of control flow on CGRA using advanced predicated execution. In Proceedings of the 2010 International Conference on Field-Programmable Technology (FPT\u201910). 429--432."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.5555\/367072.367839"},{"volume-title":"Proceedings of the European Design and Test Conference (ED TC\u201995)","author":"Holtmann U.","key":"e_1_2_1_17_1","unstructured":"Holtmann , U. and Ernst , R . 1995. Combining MBP-speculative computation and loop pipelining in high-level synthesis . In Proceedings of the European Design and Test Conference (ED TC\u201995) . 550--556. Holtmann, U. and Ernst, R. 1995. Combining MBP-speculative computation and loop pipelining in high-level synthesis. In Proceedings of the European Design and Test Conference (ED TC\u201995). 550--556."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-28365-9_4"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/2086696.2086711"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/MDT.2003.1173050"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/223982.225965"},{"key":"e_1_2_1_22_1","series-title":"Lecture Notes in Computer Science","volume-title":"ADRES: An architecture with tightly coupled VLIW processor and coarse-grained reconfigurable matrix. In Field Programmable Logic and Application. Number 2778","author":"Mei B.","year":"2003","unstructured":"Mei , B. , Vernalde , S. , and Verkest , D . 2003 a. ADRES: An architecture with tightly coupled VLIW processor and coarse-grained reconfigurable matrix. In Field Programmable Logic and Application. Number 2778 , Lecture Notes in Computer Science . Springer , Berlin , 61--70. Retrieved from http:\/\/link.springer.com\/chapter\/10.1007\/978-3-540-45234-8_7. Mei, B., Vernalde, S., and Verkest, D. 2003a. ADRES: An architecture with tightly coupled VLIW processor and coarse-grained reconfigurable matrix. In Field Programmable Logic and Application. Number 2778, Lecture Notes in Computer Science. Springer, Berlin, 61--70. Retrieved from http:\/\/link.springer.com\/chapter\/10.1007\/978-3-540-45234-8_7."},{"key":"e_1_2_1_23_1","volume-title":"Proceedings of the Conference on Design, Automation and Test in Europe (DATE\u201903)","volume":"1","author":"Mei B.","unstructured":"Mei , B. , Vernalde , S. , Verkest , D. , De Man , H. , and Lauwereins , R . 2003b. Exploiting loop-level parallelism on coarse-grained reconfigurable architectures using modulo scheduling . In Proceedings of the Conference on Design, Automation and Test in Europe (DATE\u201903) , Vol. 1 . IEEE Computer Society, Washington, DC. Mei, B., Vernalde, S., Verkest, D., De Man, H., and Lauwereins, R. 2003b. Exploiting loop-level parallelism on coarse-grained reconfigurable architectures using modulo scheduling. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE\u201903), Vol. 1. IEEE Computer Society, Washington, DC."},{"key":"e_1_2_1_24_1","doi-asserted-by":"crossref","unstructured":"Paar A. Anido M. and \n      Bagherzadeh N\n  . \n  2002\n  . A novel predication scheme for a SIMD system-on-chip. In Proceedings European Conference on Parallel Processing (Euro-Par\u201902) B. Monien and R. Feldmann Eds. Lecture Notes in Computer Science Vol. \n  2400\n  . \n  Springer Berlin 834--843.   Paar A. Anido M. and Bagherzadeh N. 2002. A novel predication scheme for a SIMD system-on-chip. In Proceedings European Conference on Parallel Processing (Euro-Par\u201902) B. Monien and R. Feldmann Eds. Lecture Notes in Computer Science Vol. 2400. Springer Berlin 834--843.","DOI":"10.1007\/3-540-45706-2_118"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/1454115.1454140"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/192724.192731"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/2366231.2337167"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0065-2458(08)60707-X"},{"key":"e_1_2_1_29_1","unstructured":"Thoziyoor S. Muralimanohar N. Ahn J. H. and Jouppi N. P. 2008. CACTI 5.1. HP Laboratories.  Thoziyoor S. Muralimanohar N. Ahn J. H. and Jouppi N. P. 2008. CACTI 5.1. HP Laboratories."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/IISWC.2009.5306794"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/1105734.1105748"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2541228.2555317","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2541228.2555317","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T07:35:01Z","timestamp":1750232101000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2541228.2555317"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2013,12]]},"references-count":31,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2013,12]]}},"alternative-id":["10.1145\/2541228.2555317"],"URL":"https:\/\/doi.org\/10.1145\/2541228.2555317","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2013,12]]},"assertion":[{"value":"2013-04-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2013-11-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2013-12-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}