{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,8]],"date-time":"2026-01-08T05:54:53Z","timestamp":1767851693028,"version":"3.49.0"},"reference-count":11,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2016,4,22]],"date-time":"2016-04-22T00:00:00Z","timestamp":1461283200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["SIGARCH Comput. Archit. News"],"published-print":{"date-parts":[[2016,4,22]]},"abstract":"<jats:p>During recent years, GPU micro-architectures have changed dramatically, evolving into powerful many-core deep-multithreaded platforms for parallel workloads. While important micro-architectural modifications continue to appear in every new generation of these processors, unfortunately, little is known about the details of these innovative designs. One of the key questions in understanding GPUs is how they deal with outstanding memory misses. Our goal in this study is to find answers to this question. To this end, we develop a set of micro-benchmarks in CUDA to understand the outstanding memory requests handling resources. Particularly, we study two NVIDIA GPGPUs (Fermi and Kepler) and estimate their capability in handling outstanding memory requests. We show that Kepler can issue nearly 32X higher number of outstanding memory requests, compared to Fermi. We explain this enhancement by Kepler's architectural modifications in outstanding memory request handling resources.<\/jats:p>","DOI":"10.1145\/2927964.2927968","type":"journal-article","created":{"date-parts":[[2016,4,25]],"date-time":"2016-04-25T19:51:13Z","timestamp":1461613873000},"page":"15-21","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["A Case Study in Reverse Engineering GPGPUs"],"prefix":"10.1145","volume":"43","author":[{"given":"Ahmad","family":"Lashgar","sequence":"first","affiliation":[{"name":"University of Victoria"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ebad","family":"Salehi","sequence":"additional","affiliation":[{"name":"University of Victoria"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Amirali","family":"Baniasadi","sequence":"additional","affiliation":[{"name":"University of Victoria"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2016,4,22]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2012.11"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2009.4919648"},{"key":"e_1_2_1_3_1","volume-title":"ISCA","author":"Kroft D.","year":"1981"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/ASAP.2014.6868644"},{"key":"e_1_2_1_5_1","unstructured":"S. Moy and J. Lindholm. Across-thread out of order instruction dispatch in a multithreaded graphics processor June 23 2005. US Patent App. 10\/742 514.  S. Moy and J. Lindholm. Across-thread out of order instruction dispatch in a multithreaded graphics processor June 23 2005. US Patent App. 10\/742 514."},{"key":"e_1_2_1_6_1","unstructured":"NVIDIA Corp. Nvidia's next generation cuda compute architecture: Kepler gk110. Available: http:\/\/www.nvidia.ca\/content\/PDF\/kepler\/NVIDIA-Kepler-GK110-Architecture-Whitepaper.pdf.  NVIDIA Corp. Nvidia's next generation cuda compute architecture: Kepler gk110. Available: http:\/\/www.nvidia.ca\/content\/PDF\/kepler\/NVIDIA-Kepler-GK110-Architecture-Whitepaper.pdf."},{"key":"e_1_2_1_7_1","unstructured":"L. Nyland etal Systems and methods for coalescing memory accesses of parallel threads Mar. 5 2013. US Patent 8 392 669.  L. Nyland et al. Systems and methods for coalescing memory accesses of parallel threads Mar. 5 2013. US Patent 8 392 669."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCSim.2011.5999886"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2011.24"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2010.5452013"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/NAS.2011.51"}],"container-title":["ACM SIGARCH Computer Architecture News"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2927964.2927968","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2927964.2927968","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:56:21Z","timestamp":1750222581000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2927964.2927968"}},"subtitle":["Outstanding Memory Handling Resources"],"short-title":[],"issued":{"date-parts":[[2016,4,22]]},"references-count":11,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2016,4,22]]}},"alternative-id":["10.1145\/2927964.2927968"],"URL":"https:\/\/doi.org\/10.1145\/2927964.2927968","relation":{},"ISSN":["0163-5964"],"issn-type":[{"value":"0163-5964","type":"print"}],"subject":[],"published":{"date-parts":[[2016,4,22]]},"assertion":[{"value":"2016-04-22","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}