{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,6,19]],"date-time":"2024-06-19T14:32:09Z","timestamp":1718807529418},"reference-count":5,"publisher":"World Scientific Pub Co Pte Lt","issue":"02","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["J CIRCUIT SYST COMP"],"published-print":{"date-parts":[[2012,4]]},"abstract":"<jats:p> As technology advances, microprocessors that integrate multiple cores on a single chip are becoming increasingly common. How to use these processors to improve the performance of a single program has been a challenge. For general-purpose applications, it is especially difficult to create efficient parallel execution due to the complex control flow and ambiguous data dependences. Thread-level speculation and transactional memory provide two hardware mechanisms that are able to optimistically parallelize potentially dependent threads. However, a compiler that performs detailed performance trade-off analysis is essential for generating efficient parallel programs for these hardwares. This compiler must be able to take into consideration the cost of intra-thread as well as inter-thread value communication. On the other hand, the ubiquitous existence of complex, input-dependent control flow and data dependence patterns in general-purpose applications makes it impossible to have one technique optimize all program patterns. In this paper, we propose three optimization techniques to improve the thread performance: (i) scheduling instruction and generating recovery code to reduce the critical forwarding path introduced by synchronizing memory resident values; (ii) identifying reduction variables and transforming the code the minimize the serializing execution; and (iii) dynamically merging consecutive iterations of a loop to avoid stalls due to unbalanced workload. Detailed evaluation of the proposed mechanism shows that each optimization technique improves a subset but none improve all of the SPEC2000 benchmarks. On average, the proposed optimizations improve the performance by 7% for the set of the SPEC2000 benchmarks that have already been optimized for register-resident value communication. <\/jats:p>","DOI":"10.1142\/s0218126612400087","type":"journal-article","created":{"date-parts":[[2012,6,11]],"date-time":"2012-06-11T02:35:07Z","timestamp":1339382107000},"page":"1240008","source":"Crossref","is-referenced-by-count":5,"title":["CODE TRANSFORMATIONS FOR ENHANCING THE PERFORMANCE OF SPECULATIVELY PARALLEL THREADS"],"prefix":"10.1142","volume":"21","author":[{"given":"SHENGYUE","family":"WANG","sequence":"first","affiliation":[{"name":"Oracle Corporation, Santa Clara, California, 95054, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"PEN-CHUNG","family":"YEW","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, University of Minnesota, Minneapolis, Minnesota, 55455, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"ANTONIA","family":"ZHAI","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, University of Minnesota, Minneapolis, Minnesota, 55455, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"219","published-online":{"date-parts":[[2012,6,11]]},"reference":[{"key":"rf17","volume-title":"Compilers: Principles, Techniques, and Tools","author":"Aho A. V.","year":"2006"},{"key":"rf18","volume-title":"High Performance Compilers for Parallel Computing","author":"Wolfe M.","year":"1996"},{"key":"rf19","volume-title":"Optimizing Compilers for Modern Architectures: A Dependence-based Approach","author":"Kennedy K.","year":"2002"},{"key":"rf23","volume":"48","author":"Tsai J.-Y.","journal-title":"IEEE Trans. Comput. Special Issue on Multithreaded Architectures"},{"key":"rf31","author":"Tsai J.-Y.","journal-title":"Int. J. Parallel Programming \u2014 Special Issue on Languages and Compilers for Parallel Computing"}],"container-title":["Journal of Circuits, Systems and Computers"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.worldscientific.com\/doi\/pdf\/10.1142\/S0218126612400087","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2019,8,7]],"date-time":"2019-08-07T04:06:24Z","timestamp":1565150784000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.worldscientific.com\/doi\/abs\/10.1142\/S0218126612400087"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,4]]},"references-count":5,"journal-issue":{"issue":"02","published-online":{"date-parts":[[2012,6,11]]},"published-print":{"date-parts":[[2012,4]]}},"alternative-id":["10.1142\/S0218126612400087"],"URL":"https:\/\/doi.org\/10.1142\/s0218126612400087","relation":{},"ISSN":["0218-1266","1793-6454"],"issn-type":[{"value":"0218-1266","type":"print"},{"value":"1793-6454","type":"electronic"}],"subject":[],"published":{"date-parts":[[2012,4]]}}}