{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T03:26:55Z","timestamp":1773804415605,"version":"3.50.1"},"reference-count":26,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2017,12,5]],"date-time":"2017-12-05T00:00:00Z","timestamp":1512432000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100004359","name":"Swedish Research Council","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100004359","id-type":"DOI","asserted-by":"crossref"}]},{"name":"ACE"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2017,12,31]]},"abstract":"<jats:p>Reducing the precision of floating-point values can improve performance and\/or reduce energy expenditure in computer graphics, among other, applications. However, reducing the precision level of floating-point values in a controlled fashion needs support both at the compiler and at the microarchitecture level. At the compiler level, a method is needed to automate the reduction of precision of each floating-point value. At the microarchitecture level, a lower precision of each floating-point register can allow more floating-point values to be packed into a register file. This, however, calls for new register file organizations.<\/jats:p>\n          <jats:p>This article proposes an automated precision-selection method and a novel GPU register file organization that can store floating-point register values at arbitrary precisions densely. The automated precision-selection method uses a data-driven approach for setting the precision level of floating-point values, given a quality threshold and a representative set of input data. By allowing a small, but acceptable, degradation in output quality, our method can remove a significant amount of the bits needed to represent floating-point values in the investigated kernels (between 28% and 60%). Our proposed register file organization exploits these lower-precision floating-point values by packing several of them into the same physical register. This reduces the register pressure per thread by up to 48%, and by 27% on average, for a negligible output-quality degradation. This can enable GPUs to keep up to twice as many threads in flight simultaneously.<\/jats:p>","DOI":"10.1145\/3151032","type":"journal-article","created":{"date-parts":[[2017,12,6]],"date-time":"2017-12-06T21:23:15Z","timestamp":1512595395000},"page":"1-25","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":13,"title":["A Framework for Automated and Controlled Floating-Point Accuracy Reduction in Graphics Applications on GPUs"],"prefix":"10.1145","volume":"14","author":[{"given":"Alexandra","family":"Angerd","sequence":"first","affiliation":[{"name":"Chalmers University of Technology, G\u00f6teborg, Sweden"}]},{"given":"Erik","family":"Sintorn","sequence":"additional","affiliation":[{"name":"Chalmers University of Technology, G\u00f6teborg, Sweden"}]},{"given":"Per","family":"Stenstr\u00f6m","sequence":"additional","affiliation":[{"name":"Chalmers University of Technology, G\u00f6teborg, Sweden"}]}],"member":"320","published-online":{"date-parts":[[2017,12,5]]},"reference":[{"key":"e_1_2_2_1_1","volume-title":"Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture. 589--600","author":"Abdel-Majeed M."},{"key":"e_1_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/1401032.1401061"},{"key":"e_1_2_2_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/1124713.1124716"},{"key":"e_1_2_2_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2004.29"},{"key":"e_1_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2013.6522330"},{"key":"e_1_2_2_6_1","volume-title":"Proceedings of the 2016 49th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201916)","author":"Jain A."},{"key":"e_1_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/2830772.2830784"},{"key":"e_1_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/MDAT.2015.2500185"},{"key":"e_1_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2005.3"},{"key":"e_1_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/2464996.2465018"},{"key":"e_1_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.5555\/977395.977673"},{"key":"e_1_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/2749469.2750417"},{"key":"e_1_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/2660193.2660231"},{"key":"e_1_2_2_14_1","unstructured":"NVIDIA. 2016. NVIDIA Tesla P100. White Paper. (2016).  NVIDIA. 2016. NVIDIA Tesla P100. White Paper. (2016)."},{"key":"e_1_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/2786805.2786807"},{"key":"e_1_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/2018323.2018349"},{"key":"e_1_2_2_17_1","volume-title":"Shadertoy. Retrieved","author":"Quilez I.","year":"2017"},{"key":"e_1_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/2503210.2503296"},{"key":"e_1_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/2540708.2540711"},{"key":"e_1_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/1993498.1993518"},{"key":"e_1_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/2370816.2370864"},{"key":"e_1_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/92.845894"},{"key":"e_1_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1364\/JOSA.57.001105"},{"key":"e_1_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/2540708.2540710"},{"key":"e_1_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISVLSI.2007.27"},{"key":"e_1_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2003.819861"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3151032","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3151032","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T02:13:39Z","timestamp":1750212819000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3151032"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,12,5]]},"references-count":26,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2017,12,31]]}},"alternative-id":["10.1145\/3151032"],"URL":"https:\/\/doi.org\/10.1145\/3151032","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"value":"1544-3566","type":"print"},{"value":"1544-3973","type":"electronic"}],"subject":[],"published":{"date-parts":[[2017,12,5]]},"assertion":[{"value":"2017-04-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2017-10-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2017-12-05","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}