{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,17]],"date-time":"2025-12-17T08:30:14Z","timestamp":1765960214380,"version":"3.41.0"},"reference-count":50,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2022,3,7]],"date-time":"2022-03-07T00:00:00Z","timestamp":1646611200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Knut and Alice Wallenberg Foundation through the Wallenberg Academy Fellows Program"},{"name":"European Research Council (ERC) under the European Union\u2019s Horizon 2020 research and innovation program","award":["715283"],"award-info":[{"award-number":["715283"]}]},{"DOI":"10.13039\/501100005416","name":"Research Council of Norway","doi-asserted-by":"crossref","award":["302279"],"award-info":[{"award-number":["302279"]}],"id":[{"id":"10.13039\/501100005416","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2022,6,30]]},"abstract":"<jats:p>\n            Exploiting memory-level parallelism (MLP) is crucial to hide long memory and last-level cache access latencies. While out-of-order (OoO) cores, and techniques building on them, are effective at exploiting MLP, they deliver poor energy efficiency due to their complex and energy-hungry hardware. This work revisits slice-out-of-order (sOoO) cores as an energy-efficient alternative for MLP exploitation. sOoO cores achieve energy efficiency by constructing and executing\n            <jats:italic>slices<\/jats:italic>\n            of MLP-generating instructions out-of-order only with respect to the rest of instructions; the slices and the remaining instructions, by themselves, execute in-order. However, we observe that existing sOoO cores miss significant MLP opportunities due to their dependence-oblivious in-order slice execution, which causes dependent slices to frequently block MLP generation. To boost MLP generation, we introduce Freeway, a sOoO core based on a new dependence-aware slice execution policy that tracks dependent slices and keeps them from blocking subsequent independent slices and MLP extraction. The proposed core incurs minimal area and power overheads, yet approaches the MLP benefits of fully OoO cores. Our evaluation shows that Freeway delivers 12% better performance than the state-of-the-art sOoO core and is within 7% of the MLP limits of full OoO execution.\n          <\/jats:p>","DOI":"10.1145\/3506704","type":"journal-article","created":{"date-parts":[[2022,3,8]],"date-time":"2022-03-08T07:19:56Z","timestamp":1646723996000},"page":"1-28","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Dependence-aware Slice Execution to Boost MLP in Slice-out-of-order Cores"],"prefix":"10.1145","volume":"19","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6306-304X","authenticated-orcid":false,"given":"Rakesh","family":"Kumar","sequence":"first","affiliation":[{"name":"Norwegian University of Science and Technology (NTNU), Trondheim, Norway"}]},{"given":"Mehdi","family":"Alipour","sequence":"additional","affiliation":[{"name":"Ericsson Research, Mobilv\u00e4gen, Lund, Sweden"}]},{"given":"David","family":"Black-Schaffer","sequence":"additional","affiliation":[{"name":"Uppsala University, Uppsala, Sweden"}]}],"member":"320","published-online":{"date-parts":[[2022,3,7]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA47549.2020.00042"},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.23919\/DATE.2019.8715034"},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1145\/379240.379251"},{"key":"e_1_3_1_5_2","unstructured":"ARM. ARM Cortex-A7 Processor. [n.d.]. Retrieved from ttp:\/\/www.arm.com\/products\/processors\/cortex-a\/cortex-a7.php."},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1145\/2629677"},{"key":"e_1_3_1_7_2","first-page":"272","volume-title":"Proceedings of the International Symposium on Computer Architecture (ISCA\u201915)","year":"2015","unstructured":"Trevor E. Carlson, Wim Heirman, Osman Allam, Stefanos Kaxiras, and Lieven Eeckhout. 2015. The load slice core microarchitecture. In Proceedings of the International Symposium on Computer Architecture (ISCA\u201915). ACM, New York, NY, 272\u2013284."},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1145\/2063384.2063454"},{"key":"e_1_3_1_9_2","first-page":"306","volume-title":"Proceedings of the 34th Annual ACM\/IEEE International Symposium on Microarchitecture (MICRO\u201901)","year":"2001","unstructured":"Jamison D. Collinsy, Dean M. Tullseny, Hong Wangz, and John P. Shen. 2001. Dynamic speculative precomputation. In Proceedings of the 34th Annual ACM\/IEEE International Symposium on Microarchitecture (MICRO\u201901). IEEE Computer Society, Washington, DC, USA, 306\u2013317."},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1145\/379240.379248"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2001.991128"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1145\/263580.263597"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.1998.650557"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2016.7783764"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1145\/2830772.2830812"},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2018.00019"},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA47549.2020.00039"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1145\/264107.264207"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1145\/325164.325162"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1145\/1950365.1950411"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1145\/605397.605415"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1145\/1012268.1012270"},{"key":"e_1_3_1_23_2","first-page":"737","volume-title":"Proceedings of the 22nd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS\u201917)","year":"2017","unstructured":"Jinchun Kim, Elvira Teran, Paul V. Gratz, D. Jimz, Seth H. Pugsley, and C. Wilkerson. 2017. Kill the program counter: Reconstructing program behavior in the processor cache hierarchy. In Proceedings of the 22nd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS\u201917). 737\u2013749."},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1145\/2464996.2465012"},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2019.00009"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1145\/379240.379259"},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1145\/3410463.3414629"},{"key":"e_1_3_1_28_2","first-page":"59","volume-title":"Proceedings of the 29th Annual International Symposium on Computer Architecture (ISCA\u201902)","year":"2002","unstructured":"A. Lebeck, Tong Li, E. Rotenberg, J. Koppanalil, and J. Patwardhan. 2002. A large, fast instruction window for tolerating cache misses. In Proceedings of the 29th Annual International Symposium on Computer Architecture (ISCA\u201902). IEEE Computer Society, Washington, DC, 59\u201370. Retrieved from http:\/\/dl.acm.org\/citation.cfm?id=545215.545223."},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCAD.2011.6105405"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2005.18"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1145\/379240.379250"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1145\/1065010.1065034"},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.1145\/377792.377856"},{"volume-title":"Proceedings of the International Symposium on High-Performance Computer Architecture (HPCA\u201903)","year":"2003","key":"e_1_3_1_34_2","unstructured":"O. Mutlu, J. Stark, C. Wilkerson, and Y. Patt. 2003. Runahead execution: An alternative to very large instruction windows for out-of-order processors. In Proceedings of the International Symposium on High-Performance Computer Architecture (HPCA\u201903)."},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA47549.2020.00040"},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.1145\/264107.264201"},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1145\/191995.192014"},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.1145\/291069.291034"},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1145\/2830772.2830815"},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1145\/605397.605403"},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2014.35"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2006.38"},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","DOI":"10.1145\/1555754.1555766"},{"key":"e_1_3_1_44_2","unstructured":"SPEC. [n.d.]. SPEC CPU2006. Retrieved from http:\/\/www.spec.org\/cpu2006\/."},{"key":"e_1_3_1_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2018.00031"},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2009.4919652"},{"key":"e_1_3_1_47_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2005.50"},{"key":"e_1_3_1_48_2","doi-asserted-by":"publisher","DOI":"10.1145\/2155620.2155672"},{"key":"e_1_3_1_49_2","doi-asserted-by":"publisher","DOI":"10.1145\/216585.216588"},{"key":"e_1_3_1_50_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2007.346187"},{"key":"e_1_3_1_51_2","doi-asserted-by":"publisher","DOI":"10.1145\/379240.379246"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3506704","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3506704","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:11:50Z","timestamp":1750191110000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3506704"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,3,7]]},"references-count":50,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2022,6,30]]}},"alternative-id":["10.1145\/3506704"],"URL":"https:\/\/doi.org\/10.1145\/3506704","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2022,3,7]]},"assertion":[{"value":"2021-05-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-12-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-03-07","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}