{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:20:08Z","timestamp":1750306808287,"version":"3.41.0"},"reference-count":25,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2013,12,1]],"date-time":"2013-12-01T00:00:00Z","timestamp":1385856000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2013,12]]},"abstract":"<jats:p>Coarse-Grained Reconfigurable Architectures (CGRAs) present a potential of high compute throughput with energy efficiency. A CGRA consists of an array of Functional Units (FUs), which communicate with each other through an interconnect network containing transmission nodes and register files. To achieve high performance from the software solutions mapped onto CGRAs, modulo scheduling of loops is generally employed. One of the key challenges in modulo scheduling for CGRAs is to explicitly handle routings of operands from a source to a destination operations through various routing resources. Existing modulo schedulers for CGRAs are slow because finding a valid routing is generally a searching problem over a large space, even with the guidance of well-defined cost metrics. Applications in traditional embedded multimedia domains are regarded as relatively tolerant to a slow compile time in exchange for a high-quality solution. However, many rapidly growing domains of applications, such as 3D graphics, require a fast compilation. Entrances of CGRAs to these domains have been blocked mainly due to their long compile time. We attack this problem by utilizing patternized routes, for which resources and time slots for a success can be estimated in advance when a source operation is placed. By conservatively reserving predefined resources at predefined time slots, future routings originating from the source operation are guaranteed. Experiments on a real-world 3D graphics benchmark suite show that our scheduler improves the compile time up to 6,000 times while achieving an average 70% throughputs of the state-of-the-art CGRA modulo scheduler, the Edge-centric Modulo Scheduler (EMS).<\/jats:p>","DOI":"10.1145\/2541228.2555314","type":"journal-article","created":{"date-parts":[[2014,1,14]],"date-time":"2014-01-14T13:39:57Z","timestamp":1389706797000},"page":"1-24","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["Fast modulo scheduler utilizing patternized routes for coarse-grained reconfigurable architectures"],"prefix":"10.1145","volume":"10","author":[{"given":"Wonsub","family":"Kim","sequence":"first","affiliation":[{"name":"Samsung Electronics, Korea"}]},{"given":"Yoonseo","family":"Choi","sequence":"additional","affiliation":[{"name":"Samsung Advanced Institute of Technology, Korea"}]},{"given":"Haewoo","family":"Park","sequence":"additional","affiliation":[{"name":"Samsung Advanced Institute of Technology, Korea"}]}],"member":"320","published-online":{"date-parts":[[2013,12]]},"reference":[{"volume-title":"Proceedings of 2012 International Conference on Field-Programmable Technology (FPT'12)","author":"Chen L.","key":"e_1_2_1_1_1","unstructured":"Chen , L. and Mitra , T . 2012. Graph minor approach for application mapping on cgras . In Proceedings of 2012 International Conference on Field-Programmable Technology (FPT'12) . IEEE, 285--292. Chen, L. and Mitra, T. 2012. Graph minor approach for application mapping on cgras. In Proceedings of 2012 International Conference on Field-Programmable Technology (FPT'12). IEEE, 285--292."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/514191.514208"},{"volume-title":"Proceedings of 5th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'97)","author":"Ebeling C.","key":"e_1_2_1_3_1","unstructured":"Ebeling , C. , Cronquist , D. C. , Franklin , P. , Secosky , J. , and Berg , S. G . 1997. Mapping applications to the rapid configurable architecture . In Proceedings of 5th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'97) . IEEE Computer Society, 106--115. Ebeling, C., Cronquist, D. C., Franklin, P., Secosky, J., and Berg, S. G. 1997. Mapping applications to the rapid configurable architecture. In Proceedings of 5th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'97). IEEE Computer Society, 106--115."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/92.475966"},{"key":"e_1_2_1_5_1","unstructured":"Ellis J. R. 1985. Bulldog: A compiler for VLIW Architectures (Parallel Computing Reduced-Instruction-Set Trace Scheduling Scientific). Ph.D. thesis Yale University.   Ellis J. R. 1985. Bulldog: A compiler for VLIW Architectures (Parallel Computing Reduced-Instruction-Set Trace Scheduling Scientific). Ph.D. thesis Yale University."},{"key":"e_1_2_1_6_1","unstructured":"Fisher J. A. Faraboschi P. and Young C. 2005. Embedded Computing\u2014A VLIW Approach to Architecture Compilers and Tools. Morgan Kaufmann.  Fisher J. A. Faraboschi P. and Young C. 2005. Embedded Computing\u2014A VLIW Approach to Architecture Compilers and Tools. Morgan Kaufmann."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/300979.300982"},{"volume-title":"Proceedings of 2012 International Conference on Field-Programmable Technology (FPT'12)","author":"Kim W.","key":"e_1_2_1_8_1","unstructured":"Kim , W. , Yoo , D. , Park , H. , and Ahn , M . 2012. SCC based modulo scheduling for coarse-grained reconfigurable processors . In Proceedings of 2012 International Conference on Field-Programmable Technology (FPT'12) . IEEE, 321--328. Kim, W., Yoo, D., Park, H., and Ahn, M. 2012. SCC based modulo scheduling for coarse-grained reconfigurable processors. In Proceedings of 2012 International Conference on Field-Programmable Technology (FPT'12). IEEE, 321--328."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/566570.566641"},{"key":"e_1_2_1_10_1","volume-title":"Proceedings of the 5th International Euro-Par Conference on Parallel Processing (Euro-Par'99)","volume":"1685","author":"Lu G.","unstructured":"Lu , G. , Singh , H. , Lee , M.-H. , Bagherzadeh , N. , Kurdahi , F. J. , and Filho , E. M. C. 1999. The morphosys parallel reconfigurable system . In Proceedings of the 5th International Euro-Par Conference on Parallel Processing (Euro-Par'99) . Vol. 1685 . Springer, 727--734. Lu, G., Singh, H., Lee, M.-H., Bagherzadeh, N., Kurdahi, F. J., and Filho, E. M. C. 1999. The morphosys parallel reconfigurable system. In Proceedings of the 5th International Euro-Par Conference on Parallel Processing (Euro-Par'99). Vol. 1685. Springer, 727--734."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/201310.201328"},{"volume-title":"Proceedings of the 2002 IEEE International Conference on Field-Programmable Technology (FPT'02)","author":"Mei B.","key":"e_1_2_1_13_1","unstructured":"Mei , B. , Vernalde , S. , Verkest , D. , Man , H. D. , and Lauwereins , R . 2002. DRESC: A retargetable compiler for coarse-grained reconfigurable architectures . In Proceedings of the 2002 IEEE International Conference on Field-Programmable Technology (FPT'02) . 166--173. Mei, B., Vernalde, S., Verkest, D., Man, H. D., and Lauwereins, R. 2002. DRESC: A retargetable compiler for coarse-grained reconfigurable architectures. In Proceedings of the 2002 IEEE International Conference on Field-Programmable Technology (FPT'02). 166--173."},{"volume-title":"Proceedings of 2003 Design, Automation and Test in Europe Conference and Exposition (DATE'03)","author":"Mei B.","key":"e_1_2_1_14_1","unstructured":"Mei , B. , Vernalde , S. , Verkest , D. , Man , H. D. , and Lauwereins , R . 2003. Exploiting loop-level parallelism on coarse-grained reconfigurable architectures using modulo scheduling . In Proceedings of 2003 Design, Automation and Test in Europe Conference and Exposition (DATE'03) . IEEE Computer Society, 10296--10301. Mei, B., Vernalde, S., Verkest, D., Man, H. D., and Lauwereins, R. 2003. Exploiting loop-level parallelism on coarse-grained reconfigurable architectures using modulo scheduling. In Proceedings of 2003 Design, Automation and Test in Europe Conference and Exposition (DATE'03). IEEE Computer Society, 10296--10301."},{"volume-title":"Proceedings of the 31st Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO'98)","author":"Nystrom E.","key":"e_1_2_1_15_1","unstructured":"Nystrom , E. and Eichenberger , A. E . 1998. Effective cluster assignment for modulo scheduling . In Proceedings of the 31st Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO'98) . 103--114. Nystrom, E. and Eichenberger, A. E. 1998. Effective cluster assignment for modulo scheduling. In Proceedings of the 31st Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO'98). 103--114."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/1542452.1542456"},{"volume-title":"Proceedings of the 31st Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO'98)","author":"\u00d6zer E.","key":"e_1_2_1_17_1","unstructured":"\u00d6zer , E. , Banerjia , S. , and Conte , T. M . 1998. Unified assign and schedule: A new approach to scheduling for clustered register file microarchitectures . In Proceedings of the 31st Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO'98) . 308--315. \u00d6zer, E., Banerjia, S., and Conte, T. M. 1998. Unified assign and schedule: A new approach to scheduling for clustered register file microarchitectures. In Proceedings of the 31st Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO'98). 308--315."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/1454115.1454140"},{"key":"e_1_2_1_19_1","unstructured":"Peeper C. and Mitchell J. L. 2003. Introduction to the directX 9 high level shading language. In ShaderX2: Introduction and Tutorials with DirectX 9 W. Engel Ed. Wordware Plano Texas.  Peeper C. and Mitchell J. L. 2003. Introduction to the directX 9 high level shading language. In ShaderX2: Introduction and Tutorials with DirectX 9 W. Engel Ed. Wordware Plano Texas."},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/192724.192731"},{"key":"e_1_2_1_21_1","unstructured":"RightWare. 2011. Basemark ES 2.0. In http:\/\/www.rightware.com\/benchmarking-software\/product-catalog.  RightWare. 2011. Basemark ES 2.0. In http:\/\/www.rightware.com\/benchmarking-software\/product-catalog."},{"key":"e_1_2_1_22_1","unstructured":"Rost R. J. Licea-Kane B. Ginsburg D. Kessenich J. M. Lichtenbelt B. Malan H. and Weiblen M. 2009. OpenGL Shading Language 3rd ed. Addison-Wesley Professional.   Rost R. J. Licea-Kane B. Ginsburg D. Kessenich J. M. Lichtenbelt B. Malan H. and Weiblen M. 2009. OpenGL Shading Language 3rd ed. Addison-Wesley Professional."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/360128.360142"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/1046192.1046196"},{"volume-title":"Proceedings of the 9th International Symposium on High-Performance Computer Architecture (HPCA'03)","author":"Terechko A.","key":"e_1_2_1_25_1","unstructured":"Terechko , A. , Thenaff , E. L. , Garg , M. , van Eijndhoven , J. T. J. , and Corp oraal, H . 2003. Inter-cluster communication models for clustered vliw processors . In Proceedings of the 9th International Symposium on High-Performance Computer Architecture (HPCA'03) . IEEE Computer Society, 354--364. Terechko, A., Thenaff, E. L., Garg, M., van Eijndhoven, J. T. J., and Corporaal, H. 2003. Inter-cluster communication models for clustered vliw processors. In Proceedings of the 9th International Symposium on High-Performance Computer Architecture (HPCA'03). IEEE Computer Society, 354--364."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/1629911.1630001"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2541228.2555314","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2541228.2555314","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T07:35:01Z","timestamp":1750232101000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2541228.2555314"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2013,12]]},"references-count":25,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2013,12]]}},"alternative-id":["10.1145\/2541228.2555314"],"URL":"https:\/\/doi.org\/10.1145\/2541228.2555314","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2013,12]]},"assertion":[{"value":"2013-06-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2013-11-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2013-12-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}