{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:22:14Z","timestamp":1750220534665,"version":"3.41.0"},"reference-count":41,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2020,9,30]],"date-time":"2020-09-30T00:00:00Z","timestamp":1601424000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"NSF","award":["CCF-1615014"],"award-info":[{"award-number":["CCF-1615014"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2020,12,31]]},"abstract":"<jats:p>A key requirement for efficient general purpose approximate computing is an amalgamation of flexible hardware design and intelligent application tuning, which together can leverage the appropriate amount of approximation that the applications engender and reap the best efficiency gains from them. To achieve this, we have identified three important features to build better general-purpose cross-layer approximation systems: \u2460 individual per-operation (\u201cspatio-temporally fine-grained\u201d) approximation, \u2461 hardware-cognizant application tuning for approximation, \u2462 systemwide approximation-synergy.<\/jats:p>\n          <jats:p>\n            We build an efficient general purpose approximation system called SHASTA: Synergic HW-SW Architecture for Spatio-Temporal Approximation, to achieve these goals.\n            <jats:sup>1<\/jats:sup>\n            First, in terms of hardware, SHASTA approximates both compute and memory\u2014SHASTA proposes (a) a form of timing approximation called Slack-control Approximation, which controls the computation timing of each approximation operation and (b) a Dynamic Pre-L1 Load Approximation mechanism to approximate loads prior to cache access. These hardware mechanisms are designed to achieve fine-grained spatio-temporally diverse approximation. Next, SHASTA proposes a Hardware-cognizant Approximation Tuning mechanism to tune an application\u2019s approximation to achieve the optimum execution efficiency under the prescribed error tolerance. The tuning mechanism is implemented atop a gradient descent algorithm and, thus, the application\u2019s approximation is tuned along the steepest error vs. execution efficiency gradient. Finally, SHASTA is designed with a full-system perspective, which achieves Synergic benefits across its optimizations, building a closer-to-ideal general purpose approximation system.\n          <\/jats:p>\n          <jats:p>SHASTA is implemented on top of an OOO core and achieves mean speedups\/energy savings of 20%\u201340% over a non-approximate baseline for greater than 90% accuracy\u2014these benefits are substantial for applications executing on a traditional general purpose processing system. SHASTA can be tuned to specific accuracy constraints and execution metrics and is quantitatively shown to achieve 2\u201315\u00d7 higher benefits, in terms of performance and energy, compared to prior work.<\/jats:p>","DOI":"10.1145\/3412375","type":"journal-article","created":{"date-parts":[[2020,9,30]],"date-time":"2020-09-30T11:23:50Z","timestamp":1601465030000},"page":"1-26","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["SHASTA"],"prefix":"10.1145","volume":"17","author":[{"given":"Gokul Subramanian","family":"Ravi","sequence":"first","affiliation":[{"name":"University of Wisconsin Madison, Electrical and Computer Engineering, Madison, WI"}]},{"given":"Joshua San","family":"Miguel","sequence":"additional","affiliation":[{"name":"University of Wisconsin Madison, Electrical and Computer Engineering, Madison, WI"}]},{"given":"Mikko","family":"Lipasti","sequence":"additional","affiliation":[{"name":"University of Wisconsin Madison, Electrical and Computer Engineering, Madison, WI"}]}],"member":"320","published-online":{"date-parts":[[2020,9,30]]},"reference":[{"key":"e_1_2_1_1_1","article-title":"X-DNNs: Systematic cross-layer approximations for energy-efficient deep neural networks","volume":"14","author":"Hanif M.","year":"2018","unstructured":"M. Hanif , A. Marchisio , T. Arif , R. Hafiz , S. Rehman , and M. Shafique . 2018 . X-DNNs: Systematic cross-layer approximations for energy-efficient deep neural networks . J. Low Pow. Electron. 14 , 4 (2018). M. Hanif, A. Marchisio, T. Arif, R. Hafiz, S. Rehman, and M. Shafique. 2018. X-DNNs: Systematic cross-layer approximations for energy-efficient deep neural networks. J. Low Pow. Electron. 14, 4 (2018).","journal-title":"J. Low Pow. Electron."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2018.2873951"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2018.2889110"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/1454115.1454128"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/2024716.2024718"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/2814270.2814301"},{"volume-title":"Proceedings of the ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages 8 Applications (OOPSLA\u201913)","author":"Carbin Michael","key":"e_1_2_1_7_1","unstructured":"Michael Carbin , Sasa Misailovic , and Martin C. Rinard . 2013. Verifying quantitative reliability for programs that execute on unreliable hardware . In Proceedings of the ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages 8 Applications (OOPSLA\u201913) . ACM, New York, NY, 33--52. Michael Carbin, Sasa Misailovic, and Martin C. Rinard. 2013. Verifying quantitative reliability for programs that execute on unreliable hardware. In Proceedings of the ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages 8 Applications (OOPSLA\u201913). ACM, New York, NY, 33--52."},{"volume-title":"Proceedings of the Design Automation Conference. 555--560","author":"Chippa V. K.","key":"e_1_2_1_8_1","unstructured":"V. K. Chippa , D. Mohapatra , A. Raghunathan , K. Roy , and S. T. Chakradhar . 2010. Scalable effort hardware design: Exploiting algorithmic resilience for energy efficiency . In Proceedings of the Design Automation Conference. 555--560 . V. K. Chippa, D. Mohapatra, A. Raghunathan, K. Roy, and S. T. Chakradhar. 2010. Scalable effort hardware design: Exploiting algorithmic resilience for energy efficiency. In Proceedings of the Design Automation Conference. 555--560."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/2150976.2151008"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2012.48"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2019.2920335"},{"volume-title":"Proceedings of the Design, Automation Test in Europe Conference Exhibition (DATE\u201918)","author":"Hashemi S.","key":"e_1_2_1_12_1","unstructured":"S. Hashemi , H. Tann , F. Buttafuoco , and S. Reda . 2018. Approximate computing for biometric security systems: A case study on iris scanning . In Proceedings of the Design, Automation Test in Europe Conference Exhibition (DATE\u201918) . 319--324. S. Hashemi, H. Tann, F. Buttafuoco, and S. Reda. 2018. Approximate computing for biometric security systems: A case study on iris scanning. In Proceedings of the Design, Automation Test in Europe Conference Exhibition (DATE\u201918). 319--324."},{"volume-title":"Proceedings of the 2006 International Symposium on Low Power Electronics and Design (ISLPED\u201906)","author":"Hill E. L.","key":"e_1_2_1_13_1","unstructured":"E. L. Hill and M. H. Lipasti . 2006. Stall cycle redistribution in a transparent fetch pipeline . In Proceedings of the 2006 International Symposium on Low Power Electronics and Design (ISLPED\u201906) . 31--36. E. L. Hill and M. H. Lipasti. 2006. Stall cycle redistribution in a transparent fetch pipeline. In Proceedings of the 2006 International Symposium on Low Power Electronics and Design (ISLPED\u201906). 31--36."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/2815400.2815403"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/1950365.1950390"},{"volume-title":"Proceedings of the 49th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201916)","author":"Jain A.","key":"e_1_2_1_16_1","unstructured":"A. Jain , P. Hill , S. C. Lin , M. Khan , M. E. Haque , M. A. Laurenzano , S. Mahlke , L. Tang , and J. Mars . 2016. Concise loads and stores: The case for an asymmetric compute-memory architecture for approximation . In Proceedings of the 49th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201916) . 1--13. A. Jain, P. Hill, S. C. Lin, M. Khan, M. E. Haque, M. A. Laurenzano, S. Mahlke, L. Tang, and J. Mars. 2016. Concise loads and stores: The case for an asymmetric compute-memory architecture for approximation. In Proceedings of the 49th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201916). 1--13."},{"volume-title":"Proceedings of the 42nd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201909)","author":"Li Sheng","key":"e_1_2_1_17_1","unstructured":"Sheng Li , Jung Ho Ahn , Richard D. Strong , Jay B. Brockman , Dean M. Tullsen , and Norman P. Jouppi . 2009. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures . In Proceedings of the 42nd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201909) . ACM, New York, NY, 469--480. Sheng Li, Jung Ho Ahn, Richard D. Strong, Jay B. Brockman, Dean M. Tullsen, and Norman P. Jouppi. 2009. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures. In Proceedings of the 42nd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201909). ACM, New York, NY, 469--480."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2015.108"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2014.22"},{"volume-title":"Proceedings of the ACM International Conference on Object-oriented Programming Systems Languages 8 Applications (OOPSLA\u201914)","author":"Misailovic Sasa","key":"e_1_2_1_20_1","unstructured":"Sasa Misailovic , Michael Carbin , Sara Achour , Zichao Qi , and Martin C. Rinard . 2014. Chisel: Reliability- and accuracy-aware optimization of approximate computational kernels . In Proceedings of the ACM International Conference on Object-oriented Programming Systems Languages 8 Applications (OOPSLA\u201914) . ACM, New York, NY, 309--328. Sasa Misailovic, Michael Carbin, Sara Achour, Zichao Qi, and Martin C. Rinard. 2014. Chisel: Reliability- and accuracy-aware optimization of approximate computational kernels. In Proceedings of the ACM International Conference on Object-oriented Programming Systems Languages 8 Applications (OOPSLA\u201914). ACM, New York, NY, 309--328."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/1806799.1806808"},{"key":"e_1_2_1_22_1","volume-title":"QAPPA: A Framework for Navigating Quality-energy Tradeoffs with Arbitrary Quantization. Technical Report","author":"Moreau Thierry","year":"2017","unstructured":"Thierry Moreau , Felipe Augusto , Patrick Howe , Armin Alaghi , and Luis Ceze . 2017 . QAPPA: A Framework for Navigating Quality-energy Tradeoffs with Arbitrary Quantization. Technical Report . University of Washington. Thierry Moreau, Felipe Augusto, Patrick Howe, Armin Alaghi, and Luis Ceze. 2017. QAPPA: A Framework for Navigating Quality-energy Tradeoffs with Arbitrary Quantization. Technical Report. University of Washington."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/3316781.3317781"},{"volume-title":"Proceedings of the Design, Automation Test in Europe Conference Exhibition (DATE\u201914)","author":"Nepal K.","key":"e_1_2_1_24_1","unstructured":"K. Nepal , Y. Li , R. I. Bahar , and S. Reda . 2014. ABACUS: A technique for automated behavioral synthesis of approximate computing circuits . In Proceedings of the Design, Automation Test in Europe Conference Exhibition (DATE\u201914) . 1--6. K. Nepal, Y. Li, R. I. Bahar, and S. Reda. 2014. ABACUS: A technique for automated behavioral synthesis of approximate computing circuits. In Proceedings of the Design, Automation Test in Europe Conference Exhibition (DATE\u201914). 1--6."},{"volume-title":"Proceedings of the IEEE 24th International Conference on High Performance Computing (HiPC\u201917)","author":"Panyala A.","key":"e_1_2_1_25_1","unstructured":"A. Panyala , O. Subasi , M. Halappanavar , A. Kalyanaraman , D. Chavarria-Miranda , and S. Krishnamoorthy . 2017. Approximate computing techniques for iterative graph algorithms . In Proceedings of the IEEE 24th International Conference on High Performance Computing (HiPC\u201917) . 23--32. A. Panyala, O. Subasi, M. Halappanavar, A. Kalyanaraman, D. Chavarria-Miranda, and S. Krishnamoorthy. 2017. Approximate computing techniques for iterative graph algorithms. In Proceedings of the IEEE 24th International Conference on High Performance Computing (HiPC\u201917). 23--32."},{"key":"e_1_2_1_26_1","volume-title":"Expax: A Framework for Automating Approximate Programming. Technical Report","author":"Park Jongse","year":"2014","unstructured":"Jongse Park , Xin Zhang , Kangqi Ni , Hadi Esmaeilzadeh , and Mayur Naik . 2014 . Expax: A Framework for Automating Approximate Programming. Technical Report . Georgia Institute of Technology . Jongse Park, Xin Zhang, Kangqi Ni, Hadi Esmaeilzadeh, and Mayur Naik. 2014. Expax: A Framework for Automating Approximate Programming. Technical Report. Georgia Institute of Technology."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/LES.2017.2658566"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/3061639.3062333"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/TVLSI.2018.2864269"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/TVLSI.2016.2586379"},{"volume-title":"Proceedings of the International Conference on Hardware\/Software Codesign and System Synthesis (CODES+ISSS\u201913)","author":"Rahimi A.","key":"e_1_2_1_31_1","unstructured":"A. Rahimi , A. Marongiu , R. K. Gupta , and L. Benini . 2013. A variability-aware OpenMP environment for efficient execution of accuracy-configurable computation on shared-FPU processor clusters . In Proceedings of the International Conference on Hardware\/Software Codesign and System Synthesis (CODES+ISSS\u201913) . 1--10. A. Rahimi, A. Marongiu, R. K. Gupta, and L. Benini. 2013. A variability-aware OpenMP environment for efficient execution of accuracy-configurable computation on shared-FPU processor clusters. In Proceedings of the International Conference on Hardware\/Software Codesign and System Synthesis (CODES+ISSS\u201913). 1--10."},{"volume-title":"Proceedings of the IEEE 25th International Symposium on High Performance Computer Architecture (HPCA\u201919)","author":"Ravi Gokul Subramanian","key":"e_1_2_1_32_1","unstructured":"Gokul Subramanian Ravi and Mikko H. Lipasti . 2019. Recycling data slack in out of order cores . In Proceedings of the IEEE 25th International Symposium on High Performance Computer Architecture (HPCA\u201919) . Gokul Subramanian Ravi and Mikko H. Lipasti. 2019. Recycling data slack in out of order cores. In Proceedings of the IEEE 25th International Symposium on High Performance Computer Architecture (HPCA\u201919)."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.5555\/3317153"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/2597809.2597812"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/2540708.2540711"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/1993498.1993518"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/2872362.2872402"},{"volume-title":"Proceedings of the 52nd Annual Design Automation Conference (DAC\u201915)","author":"Tziantzioulis G.","key":"e_1_2_1_38_1","unstructured":"G. Tziantzioulis , A. M. Gok , S. M. Faisal , N. Hardavellas , S. Ogrenci-Memik , and S. Parthasarathy . 2015. b-HiVE: A bit-level history-based error model with value correlation for voltage-scaled integer and floating point units . In Proceedings of the 52nd Annual Design Automation Conference (DAC\u201915) . ACM, New York, NY. G. Tziantzioulis, A. M. Gok, S. M. Faisal, N. Hardavellas, S. Ogrenci-Memik, and S. Parthasarathy. 2015. b-HiVE: A bit-level history-based error model with value correlation for voltage-scaled integer and floating point units. In Proceedings of the 52nd Annual Design Automation Conference (DAC\u201915). ACM, New York, NY."},{"volume-title":"Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP\u201915)","author":"Vassiliadis Vassilis","key":"e_1_2_1_39_1","unstructured":"Vassilis Vassiliadis , Konstantinos Parasyris , Charalambos Chalios , Christos D. Antonopoulos , Spyros Lalis , Nikolaos Bellas , Hans Vandierendonck , and Dimitrios S. Nikolopoulos . 2015. A programming model and runtime system for significance-aware energy-efficient computing . In Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP\u201915) . ACM, New York, NY, 275--276. Vassilis Vassiliadis, Konstantinos Parasyris, Charalambos Chalios, Christos D. Antonopoulos, Spyros Lalis, Nikolaos Bellas, Hans Vandierendonck, and Dimitrios S. Nikolopoulos. 2015. A programming model and runtime system for significance-aware energy-efficient computing. In Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP\u201915). ACM, New York, NY, 275--276."},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/2540708.2540710"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/MDAT.2016.2630270"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3412375","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3412375","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3412375","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T21:25:01Z","timestamp":1750195501000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3412375"}},"subtitle":["Synergic HW-SW Architecture for Spatio-temporal Approximation"],"short-title":[],"issued":{"date-parts":[[2020,9,30]]},"references-count":41,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2020,12,31]]}},"alternative-id":["10.1145\/3412375"],"URL":"https:\/\/doi.org\/10.1145\/3412375","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2020,9,30]]},"assertion":[{"value":"2019-10-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-07-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-09-30","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}