{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:49:41Z","timestamp":1750308581835,"version":"3.41.0"},"reference-count":35,"publisher":"Association for Computing Machinery (ACM)","issue":"5s","license":[{"start":{"date-parts":[[2017,9,27]],"date-time":"2017-09-27T00:00:00Z","timestamp":1506470400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100000266","name":"Engineering and Physical Research Council","doi-asserted-by":"crossref","award":["EP\/K034448\/1"],"award-info":[{"award-number":["EP\/K034448\/1"]}],"id":[{"id":"10.13039\/501100000266","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Embed. Comput. Syst."],"published-print":{"date-parts":[[2017,10,31]]},"abstract":"<jats:p>\n            Heterogeneous multi-processors are designed to bridge the gap between performance and energy efficiency in modern embedded systems. This is achieved by pairing\n            <jats:italic>Out-of-Order (OoO)<\/jats:italic>\n            cores, yielding performance through aggressive speculation and latency masking, with\n            <jats:italic>In-Order (InO)<\/jats:italic>\n            cores, that preserve energy through simpler design. By leveraging migrations between them, workloads can therefore select the best setting for any given energy\/delay envelope. However, migrations introduce execution overheads that can hurt performance if they happen too frequently. Finding the optimal migration frequency is critical to maximize energy savings while maintaining acceptable performance. We develop a simulation methodology that can 1) isolate the hardware effects of migrations from the software, 2) directly compare the performance of different core types, 3) quantify the performance degradation and 4) calculate the cost of migrations for each case. To showcase our methodology we run mibench, a microbenchmark suite, and show that migrations can happen as fast as every 100k instructions with little performance loss. We also show that, contrary to numerous recent studies, hypothetical designs do not need to share all of their internal components to be able to migrate at that frequency. Instead, we propose a feasible system that shares level 2 caches and a\n            <jats:italic>translation lookaside buffer<\/jats:italic>\n            that matches performance and efficiency. Our results show that there are phases comprising up to 10% that a migration to the OoO core leads to performance benefits without any additional energy cost when running on the InO core, and up to 6% of phases where a migration to the InO core can save energy without affecting performance. When considering a policy that focuses on improving the energy-delay product, results show that on average 66% of the phases can be migrated to deliver equal or better system operation without having to aggressively share the entire memory system or to revert to migration periods finer than 100k instructions.\n          <\/jats:p>","DOI":"10.1145\/3126544","type":"journal-article","created":{"date-parts":[[2017,9,27]],"date-time":"2017-09-27T12:33:53Z","timestamp":1506515633000},"page":"1-16","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["Nucleus"],"prefix":"10.1145","volume":"16","author":[{"given":"Ilias","family":"Vougioukas","sequence":"first","affiliation":[{"name":"ARM Research and University of Southampton, Southampton, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Andreas","family":"Sandberg","sequence":"additional","affiliation":[{"name":"ARM Research, Cambridge, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Stephan","family":"Diestelhorst","sequence":"additional","affiliation":[{"name":"ARM Research, Cambridge, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Bashir M.","family":"Al-Hashimi","sequence":"additional","affiliation":[{"name":"University of Southampton, Southampton, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Geoff V.","family":"Merrett","sequence":"additional","affiliation":[{"name":"University of Southampton, Southampton, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2017,9,27]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"big.LITTLE technology: The future of mobile. ARM White Paper","author":"ARM.","year":"2013","unstructured":"ARM. 2013. big.LITTLE technology: The future of mobile. ARM White Paper ( 2013 ), 12. https:\/\/www.arm.com\/files\/pdf\/big_LITTLE_Technology_the_Future_of_Mobile.pdf. ARM. 2013. big.LITTLE technology: The future of mobile. ARM White Paper (2013), 12. https:\/\/www.arm.com\/files\/pdf\/big_LITTLE_Technology_the_Future_of_Mobile.pdf."},{"volume-title":"2011 IEEE 17th International Symposium on High Performance Computer Architecture. 62--63","author":"Bhattacharjee A.","key":"e_1_2_1_2_1","unstructured":"A. Bhattacharjee , D. Lustig , and M. Martonosi . 2011. Shared last-level TLBs for chip multiprocessors . In 2011 IEEE 17th International Symposium on High Performance Computer Architecture. 62--63 . A. Bhattacharjee, D. Lustig, and M. Martonosi. 2011. Shared last-level TLBs for chip multiprocessors. In 2011 IEEE 17th International Symposium on High Performance Computer Architecture. 62--63."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/PACT.2009.26"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/2024716.2024718"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2004.839485(410) 24"},{"key":"e_1_2_1_6_1","unstructured":"Hongsuk Chung Munsik Kang and Hyun-Duk Cho. 2013. Heterogeneous multi-processing solution of exynos 5 octa with ARM\u00ae big.LITTLE\u2122 Technology. (2013) 1--8. https:\/\/www.arm.com\/files\/pdf\/Heterogeneous_Multi_Processing_Solution_of_Exynos_5_Octa_with_ARM_bigLITTLE_Technology.pdf.  Hongsuk Chung Munsik Kang and Hyun-Duk Cho. 2013. Heterogeneous multi-processing solution of exynos 5 octa with ARM\u00ae big.LITTLE\u2122 Technology. (2013) 1--8. https:\/\/www.arm.com\/files\/pdf\/Heterogeneous_Multi_Processing_Solution_of_Exynos_5_Octa_with_ARM_bigLITTLE_Technology.pdf."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/1105734.1105745"},{"key":"e_1_2_1_8_1","doi-asserted-by":"crossref","unstructured":"Robert H. Dennard Jin Cai and Arvind Kumar. 2007. A perspective on today\u2019s scaling challenges and possible future directions. Solid-State Electronics 51 4 SPEC. ISS. (2007) 518--525.  Robert H. Dennard Jin Cai and Arvind Kumar. 2007. A perspective on today\u2019s scaling challenges and possible future directions. Solid-State Electronics 51 4 SPEC. ISS. (2007) 518--525.","DOI":"10.1016\/j.sse.2007.02.004"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/2189750.2151004"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2012.17"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/1952998.1952999"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCD.2014.6974710"},{"volume-title":"2015 IEEE Hot Chips 27 Symposium, HCS 2015. IEEE, 1--1.","author":"Forbes Elliott","key":"e_1_2_1_13_1","unstructured":"Elliott Forbes , Zhenqian Zhang , Randy Widialaksono , Brandon Dwiel , Rangeen Basu Roy Chowdhury , Vinesh Srinivasan , Steve Lipa , Eric Rotenberg , W. Rhett Davis , and Paul D. Franzon . 2016. Under 100-cycle thread migration latency in a single-ISA heterogeneous multi-core processor . In 2015 IEEE Hot Chips 27 Symposium, HCS 2015. IEEE, 1--1. Elliott Forbes, Zhenqian Zhang, Randy Widialaksono, Brandon Dwiel, Rangeen Basu Roy Chowdhury, Vinesh Srinivasan, Steve Lipa, Eric Rotenberg, W. Rhett Davis, and Paul D. Franzon. 2016. Under 100-cycle thread migration latency in a single-ISA heterogeneous multi-core processor. In 2015 IEEE Hot Chips 27 Symposium, HCS 2015. IEEE, 1--1."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.5555\/89657"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.5555\/1128020.1128563"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/SAMOS.2014.6893211"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/2903150.2908078"},{"key":"e_1_2_1_18_1","unstructured":"Khubaib. 2014.\n  Performance and Energy Efficiency via an Adaptive MorphCore Architecture\n  . Ph.D. Dissertation. \n  University of Texas Austin\n  .  Khubaib. 2014. Performance and Energy Efficiency via an Adaptive MorphCore Architecture. Ph.D. Dissertation. University of Texas Austin."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/1281700.1281702"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/2628071.2628078"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2012.37"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2015.2419669"},{"key":"e_1_2_1_23_1","volume-title":"Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT. IEEE, 133--144","author":"Navada Sandeep","year":"2013","unstructured":"Sandeep Navada , Niket K. Choudhary , Salil V. Wadhavkar , and Eric Rotenberg . 2013 . A unified view of non-monotonic core selection and application steering in heterogeneous chip multiprocessors . In Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT. IEEE, 133--144 . Sandeep Navada, Niket K. Choudhary, Salil V. Wadhavkar, and Eric Rotenberg. 2013. A unified view of non-monotonic core selection and application steering in heterogeneous chip multiprocessors. In Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT. IEEE, 133--144."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/2830772.2830791"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/1555754.1555793"},{"volume-title":"2013 IEEE 31st International Conference on Computer Design, ICCD 2013. IEEE, 154--168","author":"Rotenberg Eric","key":"e_1_2_1_26_1","unstructured":"Eric Rotenberg , Brandon H. Dwiel , Elliott Forbes , Zhenqian Zhang , Randy Widialaksono , Rangeen Basu Roy Chowdhury , Nyunyi Tshibangu , Steve Lipa , W. Rhett Davis , and Paul D. Franzon . 2013. Rationale for a 3D heterogeneous multi-core processor . In 2013 IEEE 31st International Conference on Computer Design, ICCD 2013. IEEE, 154--168 . Eric Rotenberg, Brandon H. Dwiel, Elliott Forbes, Zhenqian Zhang, Randy Widialaksono, Rangeen Basu Roy Chowdhury, Nyunyi Tshibangu, Steve Lipa, W. Rhett Davis, and Paul D. Franzon. 2013. Rationale for a 3D heterogeneous multi-core processor. In 2013 IEEE 31st International Conference on Computer Design, ICCD 2013. IEEE, 154--168."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/2832087.2832095"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/IISWC.2015.29"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/1531793.1531804"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2015.2430861"},{"volume-title":"2013 IEEE International Symposium on Workload Characterization (IISWC). 113--122","author":"Sunwoo D.","key":"e_1_2_1_31_1","unstructured":"D. Sunwoo , W. Wang , M. Ghosh , C. Sudanthi , G. Blake , C. D. Emmons , and N. C. Paver . 2013. A structured approach to the simulation, analysis and characterization of smartphone applications . In 2013 IEEE International Symposium on Workload Characterization (IISWC). 113--122 . D. Sunwoo, W. Wang, M. Ghosh, C. Sudanthi, G. Blake, C. D. Emmons, and N. C. Paver. 2013. A structured approach to the simulation, analysis and characterization of smartphone applications. In 2013 IEEE International Symposium on Workload Characterization (IISWC). 113--122."},{"key":"e_1_2_1_32_1","volume-title":"The context-switch overhead inflicted by hardware interrupts (and the enigma of do-nothing loops) general. ExpCS","author":"Tsafrir Dan","year":"2007","unstructured":"Dan Tsafrir . 2007. The context-switch overhead inflicted by hardware interrupts (and the enigma of do-nothing loops) general. ExpCS ( 2007 ), 13--14. https:\/\/pdfs.semanticscholar.org\/86f8\/a42a44b82cf76dcfe023209cfa4cdc0c8981.pdf. Dan Tsafrir. 2007. The context-switch overhead inflicted by hardware interrupts (and the enigma of do-nothing loops) general. ExpCS (2007), 13--14. https:\/\/pdfs.semanticscholar.org\/86f8\/a42a44b82cf76dcfe023209cfa4cdc0c8981.pdf."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1142\/S0129626415410066"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/1084834.1084864"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISOCC.2013.6864009"}],"container-title":["ACM Transactions on Embedded Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3126544","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3126544","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T19:05:02Z","timestamp":1750273502000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3126544"}},"subtitle":["Finding the Sharing Limit of Heterogeneous Cores"],"short-title":[],"issued":{"date-parts":[[2017,9,27]]},"references-count":35,"journal-issue":{"issue":"5s","published-print":{"date-parts":[[2017,10,31]]}},"alternative-id":["10.1145\/3126544"],"URL":"https:\/\/doi.org\/10.1145\/3126544","relation":{},"ISSN":["1539-9087","1558-3465"],"issn-type":[{"type":"print","value":"1539-9087"},{"type":"electronic","value":"1558-3465"}],"subject":[],"published":{"date-parts":[[2017,9,27]]},"assertion":[{"value":"2017-05-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2017-07-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2017-09-27","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}