{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,18]],"date-time":"2025-11-18T12:14:22Z","timestamp":1763468062311,"version":"3.41.0"},"reference-count":27,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2012,1,1]],"date-time":"2012-01-01T00:00:00Z","timestamp":1325376000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001459","name":"Ministry of Education - Singapore","doi-asserted-by":"publisher","award":["MOE2009-T2-1-033"],"award-info":[{"award-number":["MOE2009-T2-1-033"]}],"id":[{"id":"10.13039\/501100001459","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2012,1]]},"abstract":"<jats:p>Computing systems have made an irreversible transition towards parallel architectures with the emergence of multi-cores. Moreover, power and thermal limits in embedded systems mandate the deployment of many simpler cores rather than a few complex cores on chip. Consumer electronic devices, on the other hand, need to support an ever-changing set of diverse applications with varying performance demands. While some applications can benefit from thread-level parallelism offered by multi-core solutions, there still exist a large number of applications with substantial amount of sequential code. The sequential programs suffer from limited exploitation of instruction-level parallelism in simple cores. We propose a reconfigurable multi-core architecture, called Bahurupi, that can successfully reconcile the conflicting demands of instruction-level and thread-level parallelism. Bahurupi can accelerate the performance of serial code by dynamically forming coalition of two or more simple cores to offer increased instruction-level parallelism. In particular, Bahurupi can efficiently merge 2-4 simple 2-way out-of-order cores to reach or even surpass the performance of more complex and power-hungry 4-way or 8-way out-of-order core. Compared to baseline 2-way core, quad-core Bahurupi achieves up to 5.61 speedup (average 4.08 speedup) for embedded workloads. On an average, quad-core Bahurupi achieves 17% performance improvement and 43% improvement in energy consumption compared to 8-way out-of-order baseline core on a diverse set of embedded benchmark applications.<\/jats:p>","DOI":"10.1145\/2086696.2086701","type":"journal-article","created":{"date-parts":[[2012,1,24]],"date-time":"2012-01-24T16:47:14Z","timestamp":1327423634000},"page":"1-21","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":29,"title":["Bahurupi"],"prefix":"10.1145","volume":"8","author":[{"given":"Mihai","family":"Pricopi","sequence":"first","affiliation":[{"name":"National University of Singapore"}]},{"given":"Tulika","family":"Mitra","sequence":"additional","affiliation":[{"name":"National University of Singapore"}]}],"member":"320","published-online":{"date-parts":[[2012,1,26]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/2.982917"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2005.51"},{"key":"e_1_2_1_3_1","unstructured":"Baldawa S. 2007. CMPSIM: A flexible multiprocessor simulation environment. M.S. Dissertation The University of Texas at Dallas.  Baldawa S. 2007. CMPSIM: A flexible multiprocessor simulation environment. M.S. Dissertation The University of Texas at Dallas."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/1454115.1454128"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/339647.339657"},{"key":"e_1_2_1_6_1","unstructured":"Chiou D. Devadas S. Rudolph L. and Ang B. 1999. Dynamic cache partitioning via columnization. Tech. rep. MIT.  Chiou D. Devadas S. Rudolph L. and Ang B. 1999. Dynamic cache partitioning via columnization. Tech. rep. MIT."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/2000064.2000067"},{"key":"e_1_2_1_8_1","unstructured":"Culler D. E. and Singh J. P. 1999. Parallel Computer Architecture: A Hardware\/Software Approach. Elsevier Morgan Kaufmann. p. 337.   Culler D. E. and Singh J. P. 1999. Parallel Computer Architecture: A Hardware\/Software Approach. Elsevier Morgan Kaufmann. p. 337."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2010.13"},{"key":"e_1_2_1_10_1","unstructured":"FreePDK 2011. FreePDK. http:\/\/www.eda.ncsu.edu\/wiki\/FreePDK.  FreePDK 2011. FreePDK. http:\/\/www.eda.ncsu.edu\/wiki\/FreePDK."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/MC.2008.209"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/1250662.1250686"},{"key":"e_1_2_1_13_1","unstructured":"Jerraya A. and Wolf W. 2005. Multiprocessor Systems-on-Chip. Elsevier Morgan Kaufmann.  Jerraya A. and Wolf W. 2005. Multiprocessor Systems-on-Chip. Elsevier Morgan Kaufmann."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.5555\/1331699.1331733"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2004.12"},{"volume-title":"Proceedings of the 31st Annual International Symposium on Computer Architecture (ISCA). 64--75","author":"Kumar R.","key":"e_1_2_1_16_1","unstructured":"Kumar , R. , Tullsen , D. M. , Ranganathan , P. , Jouppi , N. P. , and Farkas , K. I . 2004b. Single-ISA heterogeneous multi-core architectures for multithreaded workload performance . In Proceedings of the 31st Annual International Symposium on Computer Architecture (ISCA). 64--75 . Kumar, R., Tullsen, D. M., Ranganathan, P., Jouppi, N. P., and Farkas, K. I. 2004b. Single-ISA heterogeneous multi-core architectures for multithreaded workload performance. In Proceedings of the 31st Annual International Symposium on Computer Architecture (ISCA). 64--75."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/2024724.2024954"},{"volume-title":"Preceedings of the 14th International Symposium on High Performance Computer Architecture (HPCA). 252--263","author":"Salverda P.","key":"e_1_2_1_18_1","unstructured":"Salverda , P. and Zilles , C . 2008. Fundamental performance constraints in horizontal fusion of in-order cores . In Preceedings of the 14th International Symposium on High Performance Computer Architecture (HPCA). 252--263 . Salverda, P. and Zilles, C. 2008. Fundamental performance constraints in horizontal fusion of in-order cores. In Preceedings of the 14th International Symposium on High Performance Computer Architecture (HPCA). 252--263."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/859618.859667"},{"key":"e_1_2_1_20_1","unstructured":"Shivakumar P. Jouppi N. P. and Shivakumar P. 2001. CACTI 3.0: An integrated cache timing power and area model. Tech. rep. HP Labs.  Shivakumar P. Jouppi N. P. and Shivakumar P. 2001. CACTI 3.0: An integrated cache timing power and area model. Tech. rep. HP Labs."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/223982.224451"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/1508244.1508274"},{"key":"e_1_2_1_23_1","unstructured":"Synopsys 2010. Synopsys. http:\/\/www.synopsys.com.  Synopsys 2010. Synopsys. http:\/\/www.synopsys.com."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/1391469.1391666"},{"key":"e_1_2_1_25_1","unstructured":"Vajapeyam S. Rychlik B. and Shen J. P. 2008. Dependence-chain processing using trace descriptors having dependency descriptors. Intel Corporation Patent No.: US 7363467 B2.  Vajapeyam S. Rychlik B. and Shen J. P. 2008. Dependence-chain processing using trace descriptors having dependency descriptors. Intel Corporation Patent No.: US 7363467 B2."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/106972.106991"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2007.346182"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2086696.2086701","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2086696.2086701","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T10:06:42Z","timestamp":1750241202000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2086696.2086701"}},"subtitle":["A polymorphic heterogeneous multi-core architecture"],"short-title":[],"issued":{"date-parts":[[2012,1]]},"references-count":27,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2012,1]]}},"alternative-id":["10.1145\/2086696.2086701"],"URL":"https:\/\/doi.org\/10.1145\/2086696.2086701","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2012,1]]},"assertion":[{"value":"2011-07-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2011-11-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2012-01-26","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}