{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T05:06:04Z","timestamp":1750309564958,"version":"3.41.0"},"reference-count":58,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2025,3,20]],"date-time":"2025-03-20T00:00:00Z","timestamp":1742428800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"ICT Innovation","award":["E461100"],"award-info":[{"award-number":["E461100"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2025,3,31]]},"abstract":"<jats:p>Binary translation enables transparent execution, analysis, and modification of the binary program, serving as a core technology that facilitates instruction set emulation, cross-platform compatibility of software, and program instrumentation. Handling indirect branch instructions is widely recognized as a significant performance bottleneck in binary translation. While the target of a direct branch can be determined during the translation phase, an indirect branch requires a runtime lookup from the guest program counter to the host program counter, significantly influencing the performance of translator. Although several methods have been proposed to accelerate this process, each guest indirect branch instruction still translates into approximately 10 host instructions, resulting in considerable overhead.<\/jats:p>\n          <jats:p>\n            This article introduces Tiaozhuan, which addresses this issue by employing two optimization schemes. First,\n            <jats:italic>full address mapping<\/jats:italic>\n            uses a larger address space to store address mappings from guest to host, effectively reducing the number of instructions required to lookup the target of an indirect branch. Second,\n            <jats:italic>exceptionassisted branch elimination<\/jats:italic>\n            further eliminates branch instructions that check target correctness of targets in the lookup process. These two approaches enable indirect branches target lookup to be completed within one to two instructions, noticeably decreasing the overhead of indirect branches. Compared to state-of-the-art mechanisms, the SPEC CPU2006 benchmark suite showed a reduction in the number of instructions by an average of 4.2%, with the highest observed performance improvement reaching 19.4% and an average increase of 3.9%.\n          <\/jats:p>","DOI":"10.1145\/3703355","type":"journal-article","created":{"date-parts":[[2024,11,4]],"date-time":"2024-11-04T09:48:48Z","timestamp":1730713728000},"page":"1-26","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Tiaozhuan: A General and Efficient Indirect Branch Optimization for Binary Translation"],"prefix":"10.1145","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2640-8173","authenticated-orcid":false,"given":"Xinyu","family":"Li","sequence":"first","affiliation":[{"name":"State Key Lab of Processor, Institute of Computing Technology Chinese Academy of Sciences, Beijing, China and University of Chinese Academy of Sciences, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-7461-5481","authenticated-orcid":false,"given":"Guangyao","family":"Guo","sequence":"additional","affiliation":[{"name":"State Key Lab of Processor, Institute of Computing Technology Chinese Academy of Sciences, Beijing, China and University of Chinese Academy of Sciences, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-9380-9096","authenticated-orcid":false,"given":"Yanzhi","family":"Lan","sequence":"additional","affiliation":[{"name":"State Key Lab of Processor, State Key Lab of Processors, Insititute of Computing Technology, CAS, Beijing, China and University of Chinese Academy of Sciences, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-3118-8477","authenticated-orcid":false,"given":"Feng","family":"Xue","sequence":"additional","affiliation":[{"name":"State Key Lab of Processor, Institute of Computing Technology Chinese Academy of Sciences, Beijing, China and University of Chinese Academy of Sciences, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-1247-9644","authenticated-orcid":false,"given":"Chenji","family":"Han","sequence":"additional","affiliation":[{"name":"State Key Lab of Processor, Institute of Computing Technology Chinese Academy of Sciences, Beijing, China and University of Chinese Academy of Sciences, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2401-8021","authenticated-orcid":false,"given":"Gen","family":"Niu","sequence":"additional","affiliation":[{"name":"State Key Lab of Processor, Institute of Computing Technology Chinese Academy of Sciences, Beijing, China and University of Chinese Academy of Sciences, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0430-3669","authenticated-orcid":false,"given":"Fuxin","family":"Zhang","sequence":"additional","affiliation":[{"name":"State Key Lab of Processor, Institute of Computing Technology Chinese Academy of Sciences, Beijing, China"}]}],"member":"320","published-online":{"date-parts":[[2025,3,20]]},"reference":[{"unstructured":"Apple. 2021. About the Rosetta Translation Environment. Retrieved March 30 2023 from https:\/\/developer.apple.com\/documentation\/apple-silicon\/about-the-rosetta-translation-environment","key":"e_1_3_3_2_2"},{"doi-asserted-by":"crossref","unstructured":"Vasanth Bala Evelyn Duesterwald and Sanjeev Banerjia. 2000. Dynamo: A transparent dynamic optimization system. ACM SIGPLAN Notices 35 5 (2000) 1\u201312.","key":"e_1_3_3_3_2","DOI":"10.1145\/358438.349303"},{"key":"e_1_3_3_4_2","volume-title":"Proceedings of the 2005 USENIX Annual Technical Conference: FREENIX Track","author":"Bellard Fabrice","year":"2005","unstructured":"Fabrice Bellard. 2005. QEMU, a fast and portable dynamic translator. In Proceedings of the 2005 USENIX Annual Technical Conference: FREENIX Track. 41\u201346. http:\/\/www.usenix.org\/events\/usenix05\/tech\/freenix\/bellard.html"},{"doi-asserted-by":"publisher","key":"e_1_3_3_5_2","DOI":"10.1109\/IISWC.2009.5306785"},{"doi-asserted-by":"publisher","key":"e_1_3_3_6_2","DOI":"10.5555\/776261.776290"},{"doi-asserted-by":"publisher","key":"e_1_3_3_7_2","DOI":"10.1145\/3381052.3381322"},{"doi-asserted-by":"publisher","key":"e_1_3_3_8_2","DOI":"10.1016\/S0167-6423(01)00014-4"},{"key":"e_1_3_3_9_2","article-title":"Intel\u00ae 64 and IA-32 Architectures Software Developer\u2019s Manual","author":"Corporation Intel","year":"2024","unstructured":"Intel Corporation. 2024. Intel\u00ae 64 and IA-32 Architectures Software Developer\u2019s Manual. Volume 3A: System Programming Guide. Intel Corporation.","journal-title":"Volume 3A: System Programming Guide."},{"doi-asserted-by":"publisher","key":"e_1_3_3_10_2","DOI":"10.5555\/3049832.3049855"},{"doi-asserted-by":"publisher","key":"e_1_3_3_11_2","DOI":"10.1145\/3313808.3313811"},{"doi-asserted-by":"publisher","key":"e_1_3_3_12_2","DOI":"10.1145\/2693433.2693437"},{"doi-asserted-by":"publisher","key":"e_1_3_3_13_2","DOI":"10.1145\/3062341.3062371"},{"doi-asserted-by":"publisher","key":"e_1_3_3_14_2","DOI":"10.1145\/2866573"},{"key":"e_1_3_3_15_2","first-page":"15","volume-title":"Proceedings of the International Symposium on Code Generation and Optimization: Feedback-Directed and Runtime Optimization","author":"Dehnert James C.","year":"2003","unstructured":"James C. Dehnert, Brian K. Grant, John P. Banning, Richard Johnson, Thomas Kistler, Alexander Klaiber, and Jim Mattson. 2003. The Transmeta Code Morphing\u2122 software: Using speculation, recovery, and adaptive retranslation to address real-life challenges. In Proceedings of the International Symposium on Code Generation and Optimization: Feedback-Directed and Runtime Optimization(CGO \u201903). IEEE, 15\u201324."},{"key":"e_1_3_3_16_2","doi-asserted-by":"crossref","first-page":"131","DOI":"10.1145\/3033019.3033028","volume-title":"Proceedings of the 26th International Conference on Compiler Construction","author":"Federico Alessandro Di","year":"2017","unstructured":"Alessandro Di Federico, Mathias Payer, and Giovanni Agosta. 2017. rev. ng: A unified binary analysis framework to recover CFGs and function boundaries. In Proceedings of the 26th International Conference on Compiler Construction. 131\u2013141."},{"doi-asserted-by":"crossref","unstructured":"Kemal Ebcio\u011flu and Erik R. Altman. 1997. DAISY: Dynamic compilation for 100% architectural compatibility. ACM SIGARCH Computer Architecture News 25 2 (1997) 26\u201337.","key":"e_1_3_3_17_2","DOI":"10.1145\/384286.264126"},{"doi-asserted-by":"publisher","key":"e_1_3_3_18_2","DOI":"10.1145\/3453933.3454022"},{"key":"e_1_3_3_19_2","doi-asserted-by":"crossref","first-page":"49","DOI":"10.1007\/978-3-642-35632-2_8","volume-title":"Runtime Verification","author":"Eyolfson Jon","year":"2013","unstructured":"Jon Eyolfson and Patrick Lam. 2013. Detecting unread memory using dynamic binary translation. In Runtime Verification, Shaz Qadeer and Serdar Tasiran (Eds.). Springer, Berlin, Germany, 49\u201363."},{"key":"e_1_3_3_20_2","first-page":"1013","volume-title":"Proceedings of the 2024 USENIX Annual Technical Conference (USENIX ATC \u201924)","author":"Gao Chen","unstructured":"Chen Gao, Xiangwei Meng, Wei Li, Jinhui Lai, Yiran Zhang, and Fengyuan Ren. 2024. CrossMapping: Harmonizing memory consistency in Cross-ISA binary translation. In Proceedings of the 2024 USENIX Annual Technical Conference (USENIX ATC \u201924). 1013\u20131028."},{"unstructured":"Michael Gschwind. 1998. Method and Apparatus for Determining Branch Addresses in Programs Generated by Binary Translation. Report YOR8-1998-0334. IBM.","key":"e_1_3_3_21_2"},{"doi-asserted-by":"publisher","key":"e_1_3_3_22_2","DOI":"10.1016\/j.sysarc.2010.07.008"},{"doi-asserted-by":"publisher","key":"e_1_3_3_23_2","DOI":"10.1145\/1878921.1878923"},{"doi-asserted-by":"publisher","key":"e_1_3_3_24_2","DOI":"10.1145\/1186736.1186737"},{"doi-asserted-by":"publisher","key":"e_1_3_3_25_2","DOI":"10.1109\/CGO.2007.10"},{"doi-asserted-by":"publisher","key":"e_1_3_3_26_2","DOI":"10.5555\/268940.268941"},{"doi-asserted-by":"publisher","key":"e_1_3_3_27_2","DOI":"10.1109\/MM.2009.30"},{"doi-asserted-by":"publisher","key":"e_1_3_3_28_2","DOI":"10.1080\/09540091.2022.2041555"},{"unstructured":"Huawei. 2022. Huawei Kunpeng ExaGear. Retrieved March 30 2023 from https:\/\/mirrors.huaweicloud.com\/kunpeng\/archive\/ExaGear\/","key":"e_1_3_3_29_2"},{"unstructured":"The Linux Programming Interface. 2024. mmap(2)\u2014Linux Manual Page. Retrieved April 16 2024 from https:\/\/man7.org\/linux\/man-pages\/man2\/mmap.2.html","key":"e_1_3_3_30_2"},{"key":"e_1_3_3_31_2","first-page":"1","volume-title":"Proceedings of the 11th ACM Conference on Computing Frontiers","author":"Jia Ning","year":"2014","unstructured":"Ning Jia, Chun Yang, Yu He, and Xu Cheng. 2014. DTT: Program structure-aware indirect branch optimization via direct-TPC-table in DBT system. In Proceedings of the 11th ACM Conference on Computing Frontiers. 1\u201310."},{"key":"e_1_3_3_32_2","first-page":"1","volume-title":"Proceedings of the International Conference on Systems and Storage","author":"Jia Ning","year":"2014","unstructured":"Ning Jia, Chun Yang, Yu He, and Xu Cheng. 2014. SPTU: Improving dynamic binary translation through software prediction with target updating. In Proceedings of the International Conference on Systems and Storage. 1\u201312."},{"issue":"7","key":"e_1_3_3_33_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/2517326.2451516","article-title":"SPIRE: Improving dynamic binary translation through SPC-indexed indirect branch redirecting","volume":"48","author":"Jia Ning","year":"2013","unstructured":"Ning Jia, Chun Yang, Jing Wang, Dong Tong, and Keyi Wang. 2013. SPIRE: Improving dynamic binary translation through SPC-indexed indirect branch redirecting. ACM SIGPLAN Notices 48, 7 (2013), 1\u201312.","journal-title":"ACM SIGPLAN Notices"},{"key":"e_1_3_3_34_2","first-page":"25","volume-title":"Proceedings of the 2003 International Symposium on Code Generation and Optimization (CGO \u201903).","author":"Kim Ho-Seop","year":"2003","unstructured":"Ho-Seop Kim and James E. Smith. 2003. Dynamic binary translation for accumulator-oriented architectures. In Proceedings of the 2003 International Symposium on Code Generation and Optimization (CGO \u201903). IEEE, 25\u201335."},{"doi-asserted-by":"publisher","key":"e_1_3_3_35_2","DOI":"10.1109\/MICRO.2003.1253200"},{"doi-asserted-by":"publisher","key":"e_1_3_3_36_2","DOI":"10.1145\/3458744.3473348"},{"key":"e_1_3_3_37_2","first-page":"235","volume-title":"Proceedings of the International Conference on Algorithms and Architectures for Parallel Processing","author":"Lan Yanzhi","year":"2023","unstructured":"Yanzhi Lan, Qi Hu, Gen Niu, Xinyu Li, Liangpu Wang, and Fuxin Zhang. 2023. LAST: An efficient in-place static binary translator for RISC architectures. In Proceedings of the International Conference on Algorithms and Architectures for Parallel Processing. 235\u2013254."},{"unstructured":"Loongson Technology. 2022. LoongArch Documentation. Retrieved November 5 2024 from https:\/\/loongson.github.io\/LoongArch-Documentation\/","key":"e_1_3_3_38_2"},{"unstructured":"Loongson Technology. 2024. LS3A6000 Specification. Retrieved November 5 2024 from https:\/\/loongson.cn\/EN\/product\/show?id=11","key":"e_1_3_3_39_2"},{"doi-asserted-by":"publisher","key":"e_1_3_3_40_2","DOI":"10.1145\/1064978.1065034"},{"unstructured":"Microsoft. 2023. How x86 Emulation Works on Arm. Retrieved November 15 2023 from https:\/\/learn.microsoft.com\/en-us\/windows\/arm\/apps-on-arm-x86-emulation","key":"e_1_3_3_41_2"},{"unstructured":"Ingo Molnar. 2009. Performance Counters for Linux. Retrieved February 23 2022 from https:\/\/lwn.net\/Articles\/337493\/","key":"e_1_3_3_42_2"},{"doi-asserted-by":"publisher","key":"e_1_3_3_43_2","DOI":"10.1145\/1543136.1542472"},{"doi-asserted-by":"publisher","key":"e_1_3_3_44_2","DOI":"10.1145\/1273442.1250746"},{"doi-asserted-by":"publisher","key":"e_1_3_3_45_2","DOI":"10.1145\/3534056.3534939"},{"doi-asserted-by":"publisher","key":"e_1_3_3_46_2","DOI":"10.1145\/1815695.1815724"},{"unstructured":"Android Open Source Project. 2024. Common Android Kernel Tree. Retrieved April 20 2024 from https:\/\/android.googlesource.com\/kernel\/common.git","key":"e_1_3_3_47_2"},{"doi-asserted-by":"publisher","key":"e_1_3_3_48_2","DOI":"10.1145\/3519939.3523719"},{"unstructured":"Ian Rogers. 2002. Optimising Java Programs through Basic Block Dynamic Compilation. Master\u2019s Thesis. University of Manchester.","key":"e_1_3_3_49_2"},{"key":"e_1_3_3_50_2","first-page":"200","volume-title":"Proceedings of the 2004 8th International Parallel and Distributed Processing Symposium.","author":"Scott Kevin","year":"2004","unstructured":"Kevin Scott, Naveen Kumar, Bruce R. Childers, Jack W. Davidson, and Mary Lou Soffa. 2004. Overhead reduction techniques for software dynamic translation. In Proceedings of the 2004 8th International Parallel and Distributed Processing Symposium. IEEE, 200."},{"doi-asserted-by":"publisher","key":"e_1_3_3_51_2","DOI":"10.1145\/2629335"},{"key":"e_1_3_3_52_2","doi-asserted-by":"crossref","first-page":"175","DOI":"10.1145\/1134760.1220166","volume-title":"Proceedings of the 2nd International Conference on Virtual Execution Environments","author":"Sridhar Swaroop","year":"2006","unstructured":"Swaroop Sridhar, Jonathan S. Shapiro, Eric Northup, and Prashanth P. Bungale. 2006. HDTrans: An open source, low-level dynamic instrumentation system. In Proceedings of the 2nd International Conference on Virtual Execution Environments. 175\u2013185."},{"unstructured":"The Kernel Development Community. 2024. Documentation for \/proc\/sys\/vm\/. Retrieved November 5 2024 from https:\/\/docs.kernel.org\/admin-guide\/sysctl\/vm.html","key":"e_1_3_3_53_2"},{"unstructured":"Andrew Waterman Yunsup Lee Rimas Avizienis David A. Patterson and Krste Asanovic. 2015. The RISC-V Instruction Set Manual Volume II: Privileged Architecture. Version 1.7. Technical Report UCB\/EECS-2015-49. EECS Department University of California.","key":"e_1_3_3_54_2"},{"unstructured":"Andrew Waterman Yunsup Lee David A. Patterson and Krste Asanovic. 2011. The RISC-V Instruction Set Manual Volume I: Base User-Level ISA. Technical Report UCB\/EECS-2011-62. EECS Department UC Berkeley.","key":"e_1_3_3_55_2"},{"key":"e_1_3_3_56_2","first-page":"2","article-title":"Loongson instruction set architecture technology","volume":"60","author":"Weiwu Hu","year":"2023","unstructured":"Hu Weiwu, Wang Wenxiang, Wu Ruiyang, Wang Huandong, Zeng Lu, Xu Chenghua, Gao Xiang, and Zhang Fuxin. 2023. Loongson instruction set architecture technology. Journal of Computer Research and Development 60 (2023), 2\u201316.","journal-title":"Journal of Computer Research and Development"},{"doi-asserted-by":"publisher","key":"e_1_3_3_57_2","DOI":"10.1145\/3640813"},{"key":"e_1_3_3_58_2","first-page":"280","volume-title":"Proceedings of the 2012 IEEE 14th International Conference on High Performance Computing and Communication and the 2012 IEEE 9th International Conference on Embedded Software and Systems","author":"Yin Liao","year":"2012","unstructured":"Liao Yin, Jiang Haitao, Sun Guangzhong, Jin Guojie, and Chen Guoliang. 2012. Improve indirect branch prediction with private cache in dynamic binary translation. In Proceedings of the 2012 IEEE 14th International Conference on High Performance Computing and Communication and the 2012 IEEE 9th International Conference on Embedded Software and Systems. IEEE, 280\u2013286."},{"unstructured":"Joseph Yiu. 2015. ARMv8-M Architecture Technical Overview. White Paper. ARM.","key":"e_1_3_3_59_2"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3703355","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3703355","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:19:03Z","timestamp":1750295943000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3703355"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,3,20]]},"references-count":58,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2025,3,31]]}},"alternative-id":["10.1145\/3703355"],"URL":"https:\/\/doi.org\/10.1145\/3703355","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2025,3,20]]},"assertion":[{"value":"2024-07-15","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-10-26","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-03-20","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}