{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,17]],"date-time":"2025-12-17T08:30:08Z","timestamp":1765960208535,"version":"3.41.0"},"reference-count":45,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2022,1,31]],"date-time":"2022-01-31T00:00:00Z","timestamp":1643587200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"European Research Council","award":["741097"],"award-info":[{"award-number":["741097"]}]},{"name":"FWO","award":["G.0144.17N"],"award-info":[{"award-number":["G.0144.17N"]}]},{"name":"Juan de la Cierva Formaci\u00f3n Contract","award":["FJC2018-036021-I"],"award-info":[{"award-number":["FJC2018-036021-I"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2022,6,30]]},"abstract":"<jats:p>\n            Superscalar out-of-order cores deliver high performance at the cost of increased complexity and power budget. In-order cores, in contrast, are less complex and have a smaller power budget, but offer low performance. A processor architecture should ideally provide high performance in a power- and cost-efficient manner. Recently proposed\n            <jats:bold>slice-out-of-order (sOoO)<\/jats:bold>\n            cores identify backward slices of memory operations which they execute out-of-order with respect to the rest of the dynamic instruction stream for increased instruction-level and memory-hierarchy parallelism. Unfortunately, constructing backward slices is imprecise and hardware-inefficient, leaving performance on the table.\n          <\/jats:p>\n          <jats:p>\n            In this article, we propose\n            <jats:bold>Forward Slice Core (FSC<\/jats:bold>\n            ), a novel core microarchitecture that builds on a stall-on-use in-order core and extracts more instruction-level and memory-hierarchy parallelism than slice-out-of-order cores. FSC does so by identifying and steering forward slices (rather than backward slices) to dedicated in-order FIFO queues. Moreover, FSC puts load-consumers that depend on L1 D-cache misses on the side to enable younger independent load-consumers to execute faster. Finally, FSC eliminates the need for dynamic memory disambiguation by replicating store-address instructions across queues. Considering 3-wide pipeline configurations, we find that FSC improves performance by 27.1%, 21.1%, and 14.6% on average compared to Freeway, the state-of-the-art sOoO core, across SPEC CPU2017, GAP, and DaCapo, respectively, while at the same time incurring reduced hardware complexity. Compared to an OoO core, FSC reduces power consumption by 61.3% and chip area by 47%, providing a microarchitecture with high performance at low complexity.\n          <\/jats:p>","DOI":"10.1145\/3499424","type":"journal-article","created":{"date-parts":[[2022,1,31]],"date-time":"2022-01-31T13:43:02Z","timestamp":1643636582000},"page":"1-25","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["The Forward Slice Core: A High-Performance, Yet Low-Complexity Microarchitecture"],"prefix":"10.1145","volume":"19","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7750-3162","authenticated-orcid":false,"given":"Kartik","family":"Lakshminarasimhan","sequence":"first","affiliation":[{"name":"Ghent University, Technologypark, Zwijnaarde, Ghent, Belgium"}]},{"given":"Ajeya","family":"Naithani","sequence":"additional","affiliation":[{"name":"Ghent University, Technologypark, Zwijnaarde, Ghent, Belgium"}]},{"given":"Josu\u00e9","family":"Feliu","sequence":"additional","affiliation":[{"name":"Universidad de Murcia, Edificio, Murcia, Spain"}]},{"given":"Lieven","family":"Eeckhout","sequence":"additional","affiliation":[{"name":"Ghent University, Technologypark, Zwijnaarde, Ghent, Belgium"}]}],"member":"320","published-online":{"date-parts":[[2022,1,31]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2003.1253246"},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA47549.2020.00042"},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1147\/sj.391.0211"},{"journal-title":"http:\/\/www.arm.com\/products\/processors\/cortex-a\/cortex-a7.php","article-title":"ARM Cortex-A7 Processor","key":"e_1_3_2_5_2","unstructured":"ARM. ARM Cortex-A7 Processor. http:\/\/www.arm.com\/products\/processors\/cortex-a\/cortex-a7.php."},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2005.1"},{"key":"e_1_3_2_7_2","first-page":"arXiv:1508.0361","article-title":"The GAP benchmark suite","author":"Beamer Scott","year":"2015","unstructured":"Scott Beamer, Krste Asanovi\u0107, and David Patterson. 2015. The GAP benchmark suite. arXiv e-prints, Article arXiv:1508.03619 (Aug. 2015), arXiv:1508.03619 pages. arxiv:cs.DC\/1508.03619.","journal-title":"arXiv e-prints"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1145\/1167473.1167488"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.1145\/1378704.1378723"},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1145\/2749469.2750407"},{"issue":"3","key":"e_1_3_2_11_2","first-page":"28","article-title":"An evaluation of high-level mechanistic core models","volume":"11","author":"Carlson T. E.","year":"2014","unstructured":"T. E. Carlson, W. Heirman, S. Eyerman, I. Hur, and L. Eeckhout. 2014. An evaluation of high-level mechanistic core models. ACM Transactions on Architecture and Code Optimization (TACO) 11, 3 (2014), 28.","journal-title":"ACM Transactions on Architecture and Code Optimization (TACO)"},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.1145\/1555754.1555814"},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1145\/2000064.2000079"},{"key":"e_1_3_2_14_2","first-page":"355","volume-title":"Proceedings of the ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages & Applications (OOPSLA)","author":"Bois K. Du","year":"2013","unstructured":"K. Du Bois, J. B. Sartor, S. Eyerman, and L. Eeckhout. 2013. Bottle graphs: Visualizing scalability bottlenecks in multi-threaded applications. In Proceedings of the ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages & Applications (OOPSLA). 355\u2013372."},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1145\/263580.263597"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.5555\/1855084"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2009.4798281"},{"key":"e_1_3_2_18_2","volume-title":"IBM CAS Workshop","author":"Gustafsson S. Blackburn J. Ha, M.","year":"2008","unstructured":"S. Blackburn J. Ha, M. Gustafsson and K. S. McKinley. 2008. Microarchitectural characterization of production JVMs and Java workloads. In IBM CAS Workshop."},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA47549.2020.00039"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1145\/545214.545224"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2019.00009"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1145\/3410463.3414629"},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2002.1003562"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1145\/1669112.1669172"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCAD.2011.6105405"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2003.1183532"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA52012.2021.00024"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA47549.2020.00040"},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCD.2008.4751889"},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1145\/264107.264201"},{"journal-title":"https:\/\/www.globenewswire.com\/news-release\/2020\/06\/15\/2047817\/0\/en\/5G-Devices-Market-Worth-46-Billion-by-2030-Despite-the-Impact-of-COVID-19.html","article-title":"5G Revenue and Devices Forecast","key":"e_1_3_2_31_2","unstructured":"Research and Markets. 5G Revenue and Devices Forecast. https:\/\/www.globenewswire.com\/news-release\/2020\/06\/15\/2047817\/0\/en\/5G-Devices-Market-Worth-46-Billion-by-2030-Despite-the-Impact-of-COVID-19.html."},{"key":"e_1_3_2_32_2","volume-title":"Proceedings of the Sixth Annual Workshop on Duplicating, Deconstructing and Debunking (WDDD), held in conjunction with ISCA","author":"Salverda P.","year":"2007","unstructured":"P. Salverda and C. Zilles. 2007. Dependence-based scheduling revisited: A tale of two baselines. In Proceedings of the Sixth Annual Workshop on Duplicating, Deconstructing and Debunking (WDDD), held in conjunction with ISCA."},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2008.4658644"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1145\/2830772.2830815"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.1145\/605397.605403"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2014.35"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.5555\/800048.801719"},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.1145\/1024393.1024407"},{"journal-title":"https:\/\/www.statista.com\/statistics\/802690\/worldwide-connected-devices-by-access-technology\/","article-title":"IoT Devices Forecast","key":"e_1_3_2_39_2","unstructured":"Statista. IoT Devices Forecast. https:\/\/www.statista.com\/statistics\/802690\/worldwide-connected-devices-by-access-technology\/."},{"journal-title":"https:\/\/www.statista.com\/statistics\/330695\/number-of-smartphone-users-worldwide\/","article-title":"Smartphone Users","key":"e_1_3_2_40_2","unstructured":"Statista. Smartphone Users. https:\/\/www.statista.com\/statistics\/330695\/number-of-smartphone-users-worldwide\/."},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2017.7863738"},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.1145\/3192366.3192393"},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2008.23"},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.1145\/2076021.2048092"},{"key":"e_1_3_2_45_2","doi-asserted-by":"publisher","DOI":"10.1145\/379240.379246"},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.1145\/339647.339676"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3499424","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3499424","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:30:37Z","timestamp":1750188637000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3499424"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,1,31]]},"references-count":45,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2022,6,30]]}},"alternative-id":["10.1145\/3499424"],"URL":"https:\/\/doi.org\/10.1145\/3499424","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2022,1,31]]},"assertion":[{"value":"2021-07-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-11-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-01-31","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}