{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:22:10Z","timestamp":1750220530793,"version":"3.41.0"},"reference-count":41,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2021,6,8]],"date-time":"2021-06-08T00:00:00Z","timestamp":1623110400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2021,9,30]]},"abstract":"<jats:p>Achieving low load-to-use latency with low energy and storage overheads is critical for performance. Existing techniques either prefetch into the pipeline (via address prediction and validation) or provide data reuse in the pipeline (via register sharing or L0 caches). These techniques provide a range of tradeoffs between latency, reuse, and overhead.<\/jats:p>\n          <jats:p>In this work, we present a pipeline prefetching technique that achieves state-of-the-art performance and data reuse without additional data storage, data movement, or validation overheads by adding address tags to the register file. Our addition of register file tags allows us to forward (reuse) load data from the register file with no additional data movement, keep the data alive in the register file beyond the instruction\u2019s lifetime to increase temporal reuse, and coalesce prefetch requests to achieve spatial reuse. Further, we show that we can use the existing memory order violation detection hardware to validate prefetches and data forwards without additional overhead.<\/jats:p>\n          <jats:p>Our design achieves the performance of existing pipeline prefetching while also forwarding 32% of the loads from the register file (compared to 15% in state-of-the-art register sharing), delivering a 16% reduction in L1 dynamic energy (1.6% total processor energy), with an area overhead of less than 0.5%.<\/jats:p>","DOI":"10.1145\/3458883","type":"journal-article","created":{"date-parts":[[2021,6,8]],"date-time":"2021-06-08T16:21:19Z","timestamp":1623169279000},"page":"1-22","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["Early Address Prediction"],"prefix":"10.1145","volume":"18","author":[{"given":"Ricardo","family":"Alves","sequence":"first","affiliation":[{"name":"Uppsala University, Uppsala, Sweden"}]},{"given":"Stefanos","family":"Kaxiras","sequence":"additional","affiliation":[{"name":"Uppsala University, Uppsala, Sweden"}]},{"given":"David","family":"Black-Schaffer","sequence":"additional","affiliation":[{"name":"Uppsala University, Uppsala, Sweden"}]}],"member":"320","published-online":{"date-parts":[[2021,6,8]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCD.2018.00029"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/SBAC-PAD.2017.14"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/3307650.3322269"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2012.6169033"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.1999.765939"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/313817.313856"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/2024716.2024718"},{"volume-title":"Proceedings of the 25th International Symposium on Computer Architecture. IEEE, 142\u2013153","author":"George","key":"e_1_2_1_8_1","unstructured":"George Z. Chrysos and Joel S. Emer. 1998. Memory dependence prediction using store sets . In Proceedings of the 25th International Symposium on Computer Architecture. IEEE, 142\u2013153 . George Z. Chrysos and Joel S. Emer. 1998. Memory dependence prediction using store sets. In Proceedings of the 25th International Symposium on Computer Architecture. IEEE, 142\u2013153."},{"key":"e_1_2_1_9_1","volume-title":"SPEC CPU2006","author":"Standard Performance Evaluation Corporation","year":"2006","unstructured":"Standard Performance Evaluation Corporation . 2006 . SPEC CPU2006 . Retrieved from: http:\/\/www.spec.org\/cpu 20066. Standard Performance Evaluation Corporation. 2006. SPEC CPU2006. Retrieved from: http:\/\/www.spec.org\/cpu20066."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1147\/rd.374.0547"},{"volume-title":"Proceedings of the 32nd International Symposium on Computer Architecture (ISCA\u201905)","author":"Fahs B.","key":"e_1_2_1_11_1","unstructured":"B. Fahs , T. Rafacz , S. J. Patel , and S. S. Lumetta . 2005. Continuous optimization . In Proceedings of the 32nd International Symposium on Computer Architecture (ISCA\u201905) . IEEE, 86\u201397. B. Fahs, T. Rafacz, S. J. Patel, and S. S. Lumetta. 2005. Continuous optimization. In Proceedings of the 32nd International Symposium on Computer Architecture (ISCA\u201905). IEEE, 86\u201397."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/12.509907"},{"key":"e_1_2_1_13_1","unstructured":"Freddy Gabbay. 1996. Speculative Execution Based on Value Prediction. Technion-IIT Department of Electrical Engineering.  Freddy Gabbay. 1996. Speculative Execution Based on Value Prediction. Technion-IIT Department of Electrical Engineering."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/1327171.1327183"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/263580.263631"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.1998.742783"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/40.755465"},{"volume-title":"Proceedings of the 30th ACM\/IEEE International Symposium on Microarchitecture. IEEE Computer Society, 184\u2013193","author":"Kin Johnson","key":"e_1_2_1_18_1","unstructured":"Johnson Kin , Munish Gupta , and William H . Mangione-Smith. 1997. The filter cache: An energy efficient memory structure . In Proceedings of the 30th ACM\/IEEE International Symposium on Microarchitecture. IEEE Computer Society, 184\u2013193 . Johnson Kin, Munish Gupta, and William H. Mangione-Smith. 1997. The filter cache: An energy efficient memory structure. In Proceedings of the 30th ACM\/IEEE International Symposium on Microarchitecture. IEEE Computer Society, 184\u2013193."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/SP.2019.00002"},{"key":"e_1_2_1_20_1","volume-title":"Jay B. Brockman, and Norman P. Jouppi.","author":"Li Sheng","year":"2011","unstructured":"Sheng Li , Ke Chen , Jung Ho Ahn , Jay B. Brockman, and Norman P. Jouppi. 2011 . CACTI-P: Architecture-level modeling for SRAM-based structures with advanced leakage reduction techniques. In Proceedings of the International Conference on Computer-aided Design. IEEE Press , 694\u2013701. Sheng Li, Ke Chen, Jung Ho Ahn, Jay B. Brockman, and Norman P. Jouppi. 2011. CACTI-P: Architecture-level modeling for SRAM-based structures with advanced leakage reduction techniques. In Proceedings of the International Conference on Computer-aided Design. IEEE Press, 694\u2013701."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/237090.237173"},{"volume-title":"Proceedings of the 29th ACM\/IEEE International Symposium on Microarchitecture. IEEE Computer Society, 226\u2013237","author":"Mikko","key":"e_1_2_1_22_1","unstructured":"Mikko H. Lipasti and John Paul Shen. 1996. Exceeding the dataflow limit via value prediction . In Proceedings of the 29th ACM\/IEEE International Symposium on Microarchitecture. IEEE Computer Society, 226\u2013237 . Mikko H. Lipasti and John Paul Shen. 1996. Exceeding the dataflow limit via value prediction. In Proceedings of the 29th ACM\/IEEE International Symposium on Microarchitecture. IEEE Computer Society, 226\u2013237."},{"key":"e_1_2_1_23_1","volume-title":"arXiv preprint arXiv:1801.01207","author":"Lipp Moritz","year":"2018","unstructured":"Moritz Lipp , Michael Schwarz , Daniel Gruss , Thomas Prescher , Werner Haas , Stefan Mangard , Paul Kocher , Daniel Genkin , Yuval Yarom , and Mike Hamburg . 2018. Meltdown. arXiv preprint arXiv:1801.01207 ( 2018 ). Moritz Lipp, Michael Schwarz, Daniel Gruss, Thomas Prescher, Werner Haas, Stefan Mangard, Paul Kocher, Daniel Genkin, Yuval Yarom, and Mike Hamburg. 2018. Meltdown. arXiv preprint arXiv:1801.01207 (2018)."},{"key":"e_1_2_1_24_1","volume-title":"Sohi","author":"Moshovos Andreas","year":"1997","unstructured":"Andreas Moshovos , Scott E. Breach , Terani N. Vijaykumar , and Gurindar S . Sohi . 1997 . Dynamic speculation and synchronization of data dependences. In ACM SIGARCH Computer Architecture News, Vol. 25 . ACM , 181\u2013193. Andreas Moshovos, Scott E. Breach, Terani N. Vijaykumar, and Gurindar S. Sohi. 1997. Dynamic speculation and synchronization of data dependences. In ACM SIGARCH Computer Architecture News, Vol. 25. ACM, 181\u2013193."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/377792.377850"},{"key":"e_1_2_1_27_1","first-page":"49","article-title":"AVPP: Address-first value-next predictor with value prefetching for improving the efficiency of load value prediction","volume":"15","author":"Orosa Lois","year":"2018","unstructured":"Lois Orosa , Rodolfo Azevedo , and Onur Mutlu . 2018 . AVPP: Address-first value-next predictor with value prefetching for improving the efficiency of load value prediction . ACM Trans. Archit. Code Optim. 15 , 4 (2018), 49 . Lois Orosa, Rodolfo Azevedo, and Onur Mutlu. 2018. AVPP: Address-first value-next predictor with value prefetching for improving the efficiency of load value prediction. ACM Trans. Archit. Code Optim. 15, 4 (2018), 49.","journal-title":"ACM Trans. Archit. Code Optim."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.5555\/3195638.3195643"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/2678373.2665742"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2014.6835952"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2015.7056018"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2016.7446105"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/2749469.2749470"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2002.1176237"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2005.43"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2018.00017"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2005.48"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/L-CA.2007.15"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/3123939.3123951"},{"volume-title":"Proceedings of the 24th International Symposium on Computer Architecture (ISCA\u201997)","author":"Sodani Avinash","key":"e_1_2_1_40_1","unstructured":"Avinash Sodani and Gurindar S. Sohi . 1997. Dynamic instruction reuse . In Proceedings of the 24th International Symposium on Computer Architecture (ISCA\u201997) . Avinash Sodani and Gurindar S. Sohi. 1997. Dynamic instruction reuse. In Proceedings of the 24th International Symposium on Computer Architecture (ISCA\u201997)."},{"volume-title":"Proceedings of the 11th International Symposium on High-performance Computer Architecture. IEEE, 5\u201315","author":"Tuck Nathan","key":"e_1_2_1_41_1","unstructured":"Nathan Tuck and Dean M. Tullsen . 2005. Multithreaded value prediction . In Proceedings of the 11th International Symposium on High-performance Computer Architecture. IEEE, 5\u201315 . Nathan Tuck and Dean M. Tullsen. 2005. Multithreaded value prediction. In Proceedings of the 11th International Symposium on High-performance Computer Architecture. IEEE, 5\u201315."},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.5555\/266800.266827"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3458883","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3458883","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T21:24:55Z","timestamp":1750195495000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3458883"}},"subtitle":["Efficient Pipeline Prefetch and Reuse"],"short-title":[],"issued":{"date-parts":[[2021,6,8]]},"references-count":41,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2021,9,30]]}},"alternative-id":["10.1145\/3458883"],"URL":"https:\/\/doi.org\/10.1145\/3458883","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2021,6,8]]},"assertion":[{"value":"2020-12-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-03-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-06-08","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}