{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:41:39Z","timestamp":1750308099890,"version":"3.41.0"},"reference-count":18,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2004,9,29]],"date-time":"2004-09-29T00:00:00Z","timestamp":1096416000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["SIGARCH Comput. Archit. News"],"published-print":{"date-parts":[[2005,6]]},"abstract":"<jats:p>Indirect memory accesses, where a load is fed by another load, are ubiquitous because of rich data structures and sophisticated software conventions, such as the use of linkage tables and position independent code. Unfortunately, they can be costly: if both loads miss, two round trips to memory are required even though the role of the first load is often limited to fetching the address of the second load. To reduce the total latency of such indirect accesses, a new instruction called load squared is introduced. A load squared does two fetches, the first fetch reading the target address of the second. (An offset is optionally added to the result of the first fetch.) The load squared operation is performed by memory-side logic (typically, the memory controller if it isn't located on the main processor chip). In this study, load squared is not an architecturally visible instruction: the micro-architecture transparently decides which loads should be replaced by loads squared. We show that performance is sometimes improved significantly, and never degraded.<\/jats:p>","DOI":"10.1145\/1101868.1101873","type":"journal-article","created":{"date-parts":[[2006,2,6]],"date-time":"2006-02-06T18:14:10Z","timestamp":1139249650000},"page":"17-24","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Load squared"],"prefix":"10.1145","volume":"33","author":[{"given":"Sami","family":"Yehia","sequence":"first","affiliation":[{"name":"ARM Ltd, Cambridge, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jean-Francois","family":"Collard","sequence":"additional","affiliation":[{"name":"Hewlett-Packard Labs, Palo Alto CA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Olivier","family":"Temam","sequence":"additional","affiliation":[{"name":"University of Paris-Sud, France"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2004,9,29]]},"reference":[{"key":"e_1_2_1_1_1","first-page":"192","volume-title":"Proc. IEEE Int'l Conf. on Comp. Design","year":"1999","unstructured":"Flexram : Toward an advanced intelligent memory system . In Proc. IEEE Int'l Conf. on Comp. Design , page 192 , 1999 . Flexram: Toward an advanced intelligent memory system. In Proc. IEEE Int'l Conf. on Comp. Design, page 192, 1999."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/300979.300984"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.5555\/520549.822749"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.5555\/774861.774869"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/605397.605427"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/2.869367"},{"key":"e_1_2_1_8_1","unstructured":"Intel Corp. Intel Itanium 2 Processor Reference Manual.  Intel Corp. Intel Itanium 2 Processor Reference Manual."},{"key":"e_1_2_1_9_1","first-page":"206","volume-title":"Proc. 6th Int'l Symp. on High-Perf. Comp. Arch. (HPCA'6)","author":"Karlsson M.","year":"2000","unstructured":"M. Karlsson , F. Dahlgren , and P. Stenstrom . A prefetching technique for irregular accesses to linked data structures . In Proc. 6th Int'l Symp. on High-Perf. Comp. Arch. (HPCA'6) , pages 206 -- 217 , 2000 . M. Karlsson, F. Dahlgren, and P. Stenstrom. A prefetching technique for irregular accesses to linked data structures. In Proc. 6th Int'l Symp. on High-Perf. Comp. Arch. (HPCA'6), pages 206--217, 2000."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.5555\/225160.225197"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/237090.237190"},{"key":"e_1_2_1_12_1","volume-title":"Digital WRL, june","author":"McFarling S.","year":"1993","unstructured":"S. McFarling . Combining branch predictors. Technical Note TN-36 , Digital WRL, june 1993 . S. McFarling. Combining branch predictors. Technical Note TN-36, Digital WRL, june 1993."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/201059.201065"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/291069.291034"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/300979.300989"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.5555\/800052.801871"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.5555\/545215.545235"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/335231.335248"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/123465.123475"}],"container-title":["ACM SIGARCH Computer Architecture News"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1101868.1101873","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1101868.1101873","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T16:08:01Z","timestamp":1750262881000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1101868.1101873"}},"subtitle":["adding logic close to memory to reduce the latency of indirect loads with high miss ratios"],"short-title":[],"issued":{"date-parts":[[2004,9,29]]},"references-count":18,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2005,6]]}},"alternative-id":["10.1145\/1101868.1101873"],"URL":"https:\/\/doi.org\/10.1145\/1101868.1101873","relation":{"is-identical-to":[{"id-type":"doi","id":"10.1145\/1152922.1101873","asserted-by":"subject"}]},"ISSN":["0163-5964"],"issn-type":[{"type":"print","value":"0163-5964"}],"subject":[],"published":{"date-parts":[[2004,9,29]]},"assertion":[{"value":"2004-09-29","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}