{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:39:01Z","timestamp":1750307941341,"version":"3.41.0"},"reference-count":16,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2007,6,1]],"date-time":"2007-06-01T00:00:00Z","timestamp":1180656000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2007,6]]},"abstract":"<jats:p>Since the 1980's code reordering has gained popularity as an important way to improve the spatial locality of programs. While the effect of the processor's microarchitecture and memory hierarchy on this optimization technique has been investigated, little research has focused on the impact of the instruction set. In this paper, we analyze the effect of limited branch offset of the MIPS-like instruction set [Hwu et al. 2004, 2005] on code reordering, explore two simple methods to handle the exceeded branches, and propose the bidirectional code layout (BCL) algorithm to reduce the number of branches exceeding the offset limit. The BCL algorithm sorts the chains according to the position of related chains, avoids cache conflict misses deliberately and lays out the code bidirectionally. It strikes a balance among the distance of related blocks, the instruction cache miss rate, the memory size required, and the control flow transfer. Experimental results show that BCL can effectively reduce exceeded branches by 50.1%, on average, with up to 100% for some programs. Except for some programs with little spatial locality, the BCL algorithm can achieve the performance, as the case with no branch offset limitation.<\/jats:p>","DOI":"10.1145\/1250727.1250730","type":"journal-article","created":{"date-parts":[[2007,9,14]],"date-time":"2007-09-14T13:44:55Z","timestamp":1189777495000},"page":"10","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["Code reordering on limited branch offset"],"prefix":"10.1145","volume":"4","author":[{"given":"Yu","family":"Chen","sequence":"first","affiliation":[{"name":"Chinese Academy of Sciences, Beijing"}]},{"given":"Fuxin","family":"Zhang","sequence":"additional","affiliation":[{"name":"Chinese Academy of Sciences, Beijing"}]}],"member":"320","published-online":{"date-parts":[[2007,6]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/349299.349303"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/268806.268810"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1360\/crad20060821"},{"key":"e_1_2_1_4_1","first-page":"5","article-title":"Design and analysis of profile-based optimization in Compaq's compilation tools for Alpha","volume":"2","author":"Cohn R.","year":"2000","unstructured":"Cohn , R. and Lowney , P. G. 2000 . Design and analysis of profile-based optimization in Compaq's compilation tools for Alpha . Journal of Instruction Level Parallelism 2 , 5 (May), 1--25. Cohn, R. and Lowney, P. G. 2000. Design and analysis of profile-based optimization in Compaq's compilation tools for Alpha. Journal of Instruction Level Parallelism 2, 5 (May), 1--25.","journal-title":"Journal of Instruction Level Parallelism"},{"volume-title":"USENIX Windows NT Workshop, August.","author":"Cohn R.","key":"e_1_2_1_5_1","unstructured":"Cohn , R. , Goodwin , P. , Lowney , G. , and Rubin , N . 1997. Spike: An optimizer for Alpha\/NT executables . In USENIX Windows NT Workshop, August. Cohn, R., Goodwin, P., Lowney, G., and Rubin, N. 1997. Spike: An optimizer for Alpha\/NT executables. In USENIX Windows NT Workshop, August."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/258915.258931"},{"volume-title":"Godson processor project data","author":"Hwu W. W.","key":"e_1_2_1_8_1","unstructured":"Hwu , W. W. , Zhang , F. X. , Godson processor project data . Beijing : Institute of computing technology, Chinese Academy of Sciences , Beijing. Hwu, W. W., Zhang, F. X., et al. 2004. Godson processor project data. Beijing: Institute of computing technology, Chinese Academy of Sciences, Beijing."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11390-005-0243-6"},{"volume-title":"Proceedings of the 4th International Conference on High Performance Computer Architecture, Las Vegas (Feb.). 244--253","author":"Kalamatianos J.","key":"e_1_2_1_10_1","unstructured":"Kalamatianos , J. and Kaeli , D. R . 1998. Temporal-based procedure reordering for improved instruction cache performance . In Proceedings of the 4th International Conference on High Performance Computer Architecture, Las Vegas (Feb.). 244--253 . Kalamatianos, J. and Kaeli, D. R. 1998. Temporal-based procedure reordering for improved instruction cache performance. In Proceedings of the 4th International Conference on High Performance Computer Architecture, Las Vegas (Feb.). 244--253."},{"key":"e_1_2_1_11_1","first-page":"4","article-title":"Design and implementation of a lightweight dynamic optimization system","volume":"6","author":"Lu J. W.","year":"2004","unstructured":"Lu , J. W. , Chen , H. , Yew , P. C. , and Hsu , W. C. 2004 . Design and implementation of a lightweight dynamic optimization system . Journal of Instruction-Level Parallelism 6 , 4 (Apr.), 332--341. Lu, J. W., Chen, H., Yew, P. C., and Hsu, W. C. 2004. Design and implementation of a lightweight dynamic optimization system. Journal of Instruction-Level Parallelism 6, 4 (Apr.), 332--341.","journal-title":"Journal of Instruction-Level Parallelism"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/70082.68200"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1002\/1097-024X(200101)31:1%3C67::AID-SPE357%3E3.0.CO;2-A"},{"volume-title":"Proceedings of International Conference on Computer Aided Design","author":"Parameswaran S.","key":"e_1_2_1_14_1","unstructured":"Parameswaran , S. and Henkel , J . 2001. I-CoPES: Fast instruction code placement for embedded systems to improve performance and energy efficiency . In Proceedings of International Conference on Computer Aided Design , San Jose, CA (Nov.). 635--641. Parameswaran, S. and Henkel, J. 2001. I-CoPES: Fast instruction code placement for embedded systems to improve performance and energy efficiency. In Proceedings of International Conference on Computer Aided Design, San Jose, CA (Nov.). 635--641."},{"key":"e_1_2_1_15_1","first-page":"11","article-title":"REMcode: Relocating embedded code for improving system efficiency","volume":"151","author":"Parameswaran S.","year":"2004","unstructured":"Parameswaran , S. and Henkel , J. 2004 . REMcode: Relocating embedded code for improving system efficiency . IEE Proc.-Comput. Digit. Tech. 151 , 11 (Nov.), 431--435. Parameswaran, S. and Henkel, J. 2004. REMcode: Relocating embedded code for improving system efficiency. IEE Proc.-Comput. Digit. Tech. 151, 11 (Nov.), 431--435.","journal-title":"IEE Proc.-Comput. Digit. Tech."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/93542.93550"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2005.13"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1250727.1250730","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1250727.1250730","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T14:52:19Z","timestamp":1750258339000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1250727.1250730"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2007,6]]},"references-count":16,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2007,6]]}},"alternative-id":["10.1145\/1250727.1250730"],"URL":"https:\/\/doi.org\/10.1145\/1250727.1250730","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2007,6]]},"assertion":[{"value":"2007-06-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}