{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,13]],"date-time":"2025-12-13T06:59:16Z","timestamp":1765609156855,"version":"3.41.0"},"reference-count":51,"publisher":"Association for Computing Machinery (ACM)","issue":"2","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2025,6,30]]},"abstract":"<jats:p>As the complexity of real-world server applications continues to grow, performance optimizations for large-scale applications are becoming increasingly challenging. The success of online optimization offered by OCOLOS and Dynimize proves that binary rewriting based on edge profiling data can significantly accelerate these applications. However, no similar online binary optimizer is currently available on the AArch64 platform. In response to the growing adoption of the AArch64 platform, this article introduces AOBO, a fast-switching online binary optimizer specifically designed for AArch64. In addition to providing practical and efficient engineering support for AArch64-specific features, AOBO overcomes the challenge of lacking hardware counters for edge profiling on most commercially available AArch64 servers. In particular, AOBO embraces a novel edge weight estimation scheme to deliver more accurate edge estimation, which in turn allows AOBO\u2019s binary rewriter to generate more efficient code. Furthermore, time spent on AOBO\u2019s online code replacement stage is optimized to work at a subsecond level, thus enabling a fast switch from running the original binary to running the optimized one. We evaluate AOBO with CINT2017, GCC, MySQL and MongoDB, measuring the accuracy and coverage of the estimated edge weights, the performance improvements of the optimized binaries, and the online optimization cost. To make a fair comparison, we are using the performance data of the binaries generated by the default compilation scripts in the software packages as a baseline. Experimental data shows that AOBO can offer a more accurate edge weight estimation and generate binaries with superior performance. Furthermore, AOBO achieves online optimization with a very small overhead and significantly improves the performance of large-scale applications. Compared with the baselines, AOBO\u2019s online optimization can achieve 24.7% and 31.11% performance improvement respectively for MySQL and MongoDB. Notably, application pause time is reduced from 1,599.8 milliseconds to 462.1 milliseconds for MySQL, and from 1,765.9 milliseconds to 507.1 milliseconds for MongoDB.<\/jats:p>","DOI":"10.1145\/3736170","type":"journal-article","created":{"date-parts":[[2025,5,16]],"date-time":"2025-05-16T11:15:21Z","timestamp":1747394121000},"page":"1-27","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["AOBO: A Fast-Switching Online Binary Optimizer on AArch64"],"prefix":"10.1145","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6275-7701","authenticated-orcid":false,"given":"Wenlong","family":"Mu","sequence":"first","affiliation":[{"name":"School of Data Science and Engineering, East China Normal University","place":["Shanghai, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-1679-8018","authenticated-orcid":false,"given":"Yue","family":"Tang","sequence":"additional","affiliation":[{"name":"School of Data Science and Engineering, East China Normal University","place":["Shanghai, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5126-7192","authenticated-orcid":false,"given":"Bo","family":"Huang","sequence":"additional","affiliation":[{"name":"School of Data Science and Engineering, East China Normal University","place":["Shanghai, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5787-6781","authenticated-orcid":false,"given":"Jianmei","family":"Guo","sequence":"additional","affiliation":[{"name":"School of Data Science and Engineering, East China Normal University","place":["Shanghai, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,7]]},"reference":[{"key":"e_1_3_2_2_2","first-page":"419","volume-title":"Proceedings of the 17th USENIX Symposium on Networked Systems Design and Implementation","author":"Agache Alexandru","year":"2020","unstructured":"Alexandru Agache, Marc Brooker, Alexandra Iordache, Anthony Liguori, Rolf Neugebauer, Phil Piwonka, and Diana-Maria Popa. 2020. Firecracker: Lightweight virtualization for serverless applications. In Proceedings of the 17th USENIX Symposium on Networked Systems Design and Implementation. USENIX Association, Santa Clara, CA, 419\u2013434. Retrieved from https:\/\/www.usenix.org\/conference\/nsdi20\/presentation\/agache"},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.1145\/3508352.3549424"},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1145\/3182177"},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1145\/3640537.3641573"},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","unstructured":"Derek Bruening Timothy Garnett and Saman Amarasinghe. 2003. An infrastructure for adaptive dynamic optimization. In International Symposium on Code Generation and Optimization CGO\u201903. IEEE 265\u2013275. 10.1109\/CGO.2003.1191551","DOI":"10.1109\/CGO.2003.1191551"},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.1145\/2854038.2854044"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2011.233"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2013.6494982"},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1145\/3314221.3314587"},{"key":"e_1_3_2_11_2","unstructured":"Brendan Gregg. 2013. Linux performance analysis and tools. Retrieved from https:\/\/www.brendangregg.com\/Slides\/SCaLE_Linux_Performance2013.pdf"},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.1145\/2541228.2555311"},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2022.3159249"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1145\/3470451"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1145\/3498714"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/CGO57630.2024.10444807"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-21878-1_18"},{"key":"e_1_3_2_18_2","volume-title":"An Introduction to Last Branch Records","author":"Kleen Andi","year":"2016","unstructured":"Andi Kleen. 2016. An Introduction to Last Branch Records. Retrieved November 5, 2024 from https:\/\/lwn.net\/Articles\/680985\/"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1145\/1369396.1370017"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1145\/3447786.3456248"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1145\/3302516.3307358"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-77560-7_20"},{"key":"e_1_3_2_23_2","unstructured":"Arm Limited. 2024. AArch64 Branch Record Buffer Extension Registers Summary. Retrieved November 5 2024 from https:\/\/developer.arm.com\/documentation\/101595\/0001\/AArch64-registers\/AArch64-Branch-Record-Buffer-Extension-registers-summary"},{"key":"e_1_3_2_24_2","unstructured":"Arm Limited. 2024. Embedded Trace Macrocell Architecture Specification. Retrieved November 5 2024 from https:\/\/developer.arm.com\/documentation\/ihi0014\/latest\/"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1145\/3242089"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/CLUSTER52292.2023.00032"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1145\/3418055"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1145\/1065010.1065034"},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2004.1281660"},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1145\/3461648.3463853"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2007.35"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1145\/3572848.3577475"},{"key":"e_1_3_2_33_2","first-page":"33","volume-title":"Proceedings of the 26th USENIX Conference on Security Symposium","author":"Ning Zhenyu","year":"2017","unstructured":"Zhenyu Ning and Fengwei Zhang. 2017. Ninja: Towards transparent tracing and debugging on ARM. In Proceedings of the 26th USENIX Conference on Security Symposium (Vancouver, BC, Canada). USENIX Association, USA, 33\u201349."},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIFS.2018.2883027"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/LLVM-HPC.2014.8"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1145\/3192366.3192374"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.5555\/3049832.3049858"},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.5555\/3314872.3314876"},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1145\/3446804.3446843"},{"key":"e_1_3_2_40_2","unstructured":"LLVM Project. 2021. The Interface for the Profile Inference Algorithm Profi. Retrieved November 11 2024 from https:\/\/github.com\/llvm\/llvm-project\/blob\/main\/llvm\/include\/llvm\/Transforms\/Utils\/SampleProfileInference.h"},{"key":"e_1_3_2_41_2","unstructured":"LLVM Project. 2023. The Implementation of Stale Profile Matching in BOLT. Retrieved November 11 2024 from https:\/\/github.com\/llvm\/llvm-project\/blob\/main\/bolt\/lib\/Profile\/StaleProfileMatching.cpp"},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.1007\/s41635-023-00133-3"},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.1145\/3575693.3575727"},{"key":"e_1_3_2_44_2","volume-title":"Statistical Profiling Extension for ARMv8-A","author":"Williams Michael","year":"2017","unstructured":"Michael Williams. 2017. Statistical Profiling Extension for ARMv8-A. Retrieved November 5, 2024 from https:\/\/community.arm.com\/arm-community-blogs\/b\/architectures-and-processors-blog\/posts\/statistical-profiling-extension-for-armv8-a"},{"key":"e_1_3_2_45_2","doi-asserted-by":"publisher","DOI":"10.1145\/3338502.3359763"},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.1038\/s44172-023-00127-7"},{"key":"e_1_3_2_47_2","doi-asserted-by":"publisher","DOI":"10.1145\/3485513"},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1002\/ecja.20185"},{"key":"e_1_3_2_49_2","volume-title":"Optimize MySQL and MariaDB CPU Performance With Dynimize","author":"Yeager David","year":"2023","unstructured":"David Yeager. 2023. Optimize MySQL and MariaDB CPU Performance With Dynimize. Technical Report. DYNIMIZE INC. Retrieved from https:\/\/dynimize.com\/"},{"key":"e_1_3_2_50_2","doi-asserted-by":"publisher","DOI":"10.1145\/3341109"},{"key":"e_1_3_2_51_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIFS.2024.3372816"},{"key":"e_1_3_2_52_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO56248.2022.00045"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3736170","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,1]],"date-time":"2025-07-01T12:32:45Z","timestamp":1751373165000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3736170"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,30]]},"references-count":51,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2025,6,30]]}},"alternative-id":["10.1145\/3736170"],"URL":"https:\/\/doi.org\/10.1145\/3736170","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2025,6,30]]},"assertion":[{"value":"2024-07-10","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-05-08","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-07-01","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}