{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T02:10:05Z","timestamp":1750299005893,"version":"3.41.0"},"reference-count":36,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2025,5,23]],"date-time":"2025-05-23T00:00:00Z","timestamp":1747958400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62272407"],"award-info":[{"award-number":["62272407"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"\u201cPioneer\u201d and \u201cLeading Goose\u201d R&D Program of Zhejiang","award":["2023C01033"],"award-info":[{"award-number":["2023C01033"]}]},{"name":"National Youth Talent Support Program"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Internet Technol."],"published-print":{"date-parts":[[2025,5,31]]},"abstract":"<jats:p>\n            WebAssembly has shown promising potential on various IoT devices to achieve the desired features such as multi-language support and seamless device-cloud integration. The execution performance of WebAssembly bytecode is directly influenced by compilation sequences. While existing research has explored the optimization of compilation sequences for native code, these approaches are not suitable to WebAssembly bytecode due to its unique instruction format and control flow graph structure. In this work, we propose WasmRL, a novel efficient deep reinforcement learning (DRL)-based compiler optimization framework tailored for WebAssembly bytecode. We conduct a fine-grained analysis of the characteristics of WebAssembly instructions and associated compilation flags. We observe that the same compilation sequence may yield contrasting performance outcomes in WebAssembly and native code. Motivated by our observation, we introduce a WebAssembly-specific DRL state representation that simultaneously captures the impact of various compilation sequences on the WebAssembly bytecode and its runtime performance. To enhance the training efficiency of the DRL model, we propose a tree-based action space refinement method. Furthermore, we develop a pluggable cross-platform training strategy to optimize WebAssembly bytecode across different IoT devices. We evaluate the performance of WasmRL extensively on PolybenchC, MiBench, Shootout public datasets and real-world IoT applications. Experimental results show: (1) The DRL model trained on a specific device achieves 1.4x\/1.1x speedups over\n            <jats:monospace>-O3<\/jats:monospace>\n            for seen\/unseen programs; (2) The DRL model trained on different devices simultaneously achieves 1.21x\/1.06x improvements respectively. The code has been available at\n            <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" xlink:href=\"https:\/\/github.com\/CarrollAdmin\/WasmRL\">https:\/\/github.com\/CarrollAdmin\/WasmRL<\/jats:ext-link>\n            .\n          <\/jats:p>","DOI":"10.1145\/3731451","type":"journal-article","created":{"date-parts":[[2025,4,19]],"date-time":"2025-04-19T10:11:41Z","timestamp":1745057501000},"page":"1-26","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Optimizing WebAssembly Bytecode for IoT Devices Using Deep Reinforcement Learning"],"prefix":"10.1145","volume":"25","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2872-7327","authenticated-orcid":false,"given":"Kaijie","family":"Gong","sequence":"first","affiliation":[{"name":"College of Computer Science, Zhejiang University, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-2585-4996","authenticated-orcid":false,"given":"Ruiqi","family":"Yang","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-1540-7331","authenticated-orcid":false,"given":"Haoyu","family":"Li","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7897-5965","authenticated-orcid":false,"given":"Yi","family":"Gao","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0498-1494","authenticated-orcid":false,"given":"Wei","family":"Dong","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}]}],"member":"320","published-online":{"date-parts":[[2025,5,23]]},"reference":[{"key":"e_1_3_2_2_2","volume-title":"Machine Learning for Computer Architecture and Systems 2022","author":"Almakki Mohammed","year":"2022","unstructured":"Mohammed Almakki, Ayman Izzeldin, Qijing Huang, Ameer Haj Ali, and Chris Cummins. 2022. Autophase V2: Towards function level phase ordering optimization. In Machine Learning for Computer Architecture and Systems 2022."},{"key":"e_1_3_2_3_2","unstructured":"Tal Ben-Nun Alice Shoshana Jakobovits and Torsten Hoefler. 2018. Neural code comprehension: A learnable representation of code semantics. Proceedings of the 32nd International Conference on Neural Information Processing Systems (NeurIPS\u201918)."},{"key":"e_1_3_2_4_2","unstructured":"Craig Blackmore Oliver Ray and Kerstin Eder. 2017. Automatically tuning the gcc compiler to optimize the performance of applications running on embedded systems. arXiv:1703.08228. Retrieved from https:\/\/arxiv.org\/abs\/1703.08228"},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE43902.2021.00110"},{"key":"e_1_3_2_6_2","first-page":"2244","volume-title":"International Conference on Machine Learning","author":"Cummins Chris","year":"2021","unstructured":"Chris Cummins, Zacharias V. Fisches, Tal Ben-Nun, Torsten Hoefler, Michael F. P. O\u2019Boyle, and Hugh Leather. 2021. Programl: A graph-based program representation for data flow analysis and compiler optimizations. In International Conference on Machine Learning. PMLR, 2244\u20132253."},{"key":"e_1_3_2_7_2","doi-asserted-by":"crossref","unstructured":"Chris Cummins Bram Wasti Jiadong Guo Brandon Cui Jason Ansel Sahir Gomez Somya Jain Jia Liu Olivier Teytaud Benoit Steiner Yuandong Tian and Hugh Leather. 2022. Compilergym: Robust performant compiler optimization environments for AI research. In 2022 IEEE\/ACM International Symposium on Code Generation and Optimization (CGO\u201922). IEEE 92\u2013105.","DOI":"10.1109\/CGO53902.2022.9741258"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/INFOCOM48880.2022.9796872"},{"key":"e_1_3_2_9_2","first-page":"1683","volume-title":"20th USENIX Symposium on Networked Systems Design and Implementation (NSDI\u201923)","author":"Dong Wei","year":"2023","unstructured":"Wei Dong, Borui Li, Haoyu Li, Hao Wu, Kaijie Gong, Wenzhao Zhang, and Yi Gao. 2023. \\(\\lbrace\\) LinkLab \\(\\rbrace\\) 2.0: A multi-tenant programmable \\(\\lbrace\\) IoT \\(\\rbrace\\) testbed for experimentation with \\(\\lbrace\\) Edge-Cloud \\(\\rbrace\\) integration. In 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI\u201923). 1683\u20131699."},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1007\/s12652-023-04534-8"},{"key":"e_1_3_2_11_2","first-page":"140","volume-title":"2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid\u201922)","author":"Gackstatter Philipp","year":"2022","unstructured":"Philipp Gackstatter, Pantelis A. Frangoudis, and Schahram Dustdar. 2022. Pushing serverless to the edge with webassembly runtimes. In 2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid\u201922). IEEE, 140\u2013149."},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.1145\/2908961.2931696"},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1145\/3062341.3062363"},{"key":"e_1_3_2_14_2","unstructured":"Ameer Haj-Ali Qijing Jenny Huang John Xiang William Moses Krste Asanovic John Wawrzynek and Ion Stoica. 2020. Autophase: Juggling hls phase orderings in random forests with deep reinforcement learning. Proceedings of Machine Learning and Systems (MLSys\u201920) 2 (2020) 70\u201381."},{"key":"e_1_3_2_15_2","article-title":"WebAssembly and JavaScript challenge: Numerical program performance using modern browser technologies and devices","author":"Herrera David","year":"2018","unstructured":"David Herrera, Hangfen Chen, Erick Lavoie, and Laurie Hendren. 2018. WebAssembly and JavaScript challenge: Numerical program performance using modern browser technologies and devices. University of McGill, Montreal: QC, Technical report SABLE-TR-2018-2 (2018).","journal-title":"University of McGill, Montreal: QC, Technical report SABLE-TR-2018-2"},{"key":"e_1_3_2_16_2","first-page":"107","volume-title":"2019 USENIX Annual Technical Conference (USENIX ATC\u201919)","author":"Jangda Abhinav","year":"2019","unstructured":"Abhinav Jangda, Bobby Powers, Emery D. Berger, and Arjun Guha. 2019. Not so fast: Analyzing the performance of \\(\\lbrace\\) WebAssembly \\(\\rbrace\\) vs. native code. In 2019 USENIX Annual Technical Conference (USENIX ATC\u201919). 107\u2013120."},{"key":"e_1_3_2_17_2","first-page":"661","volume-title":"2023 38th IEEE\/ACM International Conference on Automated Software Engineering (ASE\u201923)","author":"Jiang Shuyao","year":"2023","unstructured":"Shuyao Jiang, Ruiying Zeng, Zihao Rao, Jiazhen Gu, Yangfan Zhou, and Michael R. Lyu. 2023. Revealing performance issues in server-side WebAssembly runtimes via differential testing. In 2023 38th IEEE\/ACM International Conference on Automated Software Engineering (ASE\u201923). IEEE, 661\u2013672."},{"key":"e_1_3_2_18_2","unstructured":"Guolin Ke Qi Meng Thomas Finley Taifeng Wang Wei Chen Weidong Ma Qiwei Ye and Tie-Yan Liu. 2017. Lightgbm: A highly efficient gradient boosting decision tree. Proceedings of the 31st International Conference on Neural Information Processing Systems (NeurIPS\u201917)."},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/INFOCOM42981.2021.9488424"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1145\/3498361.3538922"},{"issue":"1","key":"e_1_3_2_21_2","first-page":"1","article-title":"Iterative compilation optimization based on metric learning and collaborative filtering","volume":"19","author":"Liu Hongzhi","year":"2021","unstructured":"Hongzhi Liu, Jie Luo, Ying Li, and Zhonghai Wu. 2021. Iterative compilation optimization based on metric learning and collaborative filtering. ACM Transactions on Architecture and Code Optimization (TACO) 19, 1 (2021), 1\u201325.","journal-title":"ACM Transactions on Architecture and Code Optimization (TACO)"},{"key":"e_1_3_2_22_2","first-page":"1","volume-title":"2020 IEEE\/ACM 6th Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC) and Workshop on Hierarchical Parallelism for Exascale Computing (HiPar\u201920)","author":"Mammadli Rahim","year":"2020","unstructured":"Rahim Mammadli, Ali Jannesari, and Felix Wolf. 2020. Static neural compiler optimization via deep reinforcement learning. In 2020 IEEE\/ACM 6th Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC) and Workshop on Hierarchical Parallelism for Exascale Computing (HiPar\u201920). IEEE, 1\u201311."},{"key":"e_1_3_2_23_2","doi-asserted-by":"crossref","unstructured":"Daniel J. Mankowitz Andrea Michi Anton Zhernov Marco Gelmi Marco Selvi Cosmin Paduraru Edouard Leurent Shariq Iqbal Jean-Baptiste Lespiau Alex Ahern Thomas K\u00f6ppe Kevin Millikin Stephen Gaffney Sophie Elster Jackson Broshear Chris Gamble Kieran Milan Robert Tung Minjae Hwang Taylan Cemgil Mohammadamin Barekatain Yujia Li Amol Mandhane Thomas Hubert Julian Schrittwieser Demis Hassabis Pushmeet Kohli Martin Riedmiller Oriol Vinyals and David Silver. 2023. Faster sorting algorithms discovered using deep reinforcement learning. Nature 618 7964 (2023) 257\u2013263.","DOI":"10.1038\/s41586-023-06004-9"},{"key":"e_1_3_2_24_2","first-page":"1177","volume-title":"2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS\u201922)","author":"M\u00e9n\u00e9trey J\u00e4mes","year":"2022","unstructured":"J\u00e4mes M\u00e9n\u00e9trey, Marcelo Pasin, Pascal Felber, and Valerio Schiavoni. 2022. Watz: A Trusted WebAssembly runtime environment with remote attestation for TrustZone. In 2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS\u201922). IEEE, 1177\u20131189."},{"key":"e_1_3_2_25_2","first-page":"118","volume-title":"2022 IEEE\/ACM International Symposium on Code Generation and Optimization (CGO\u201922)","author":"Park Sunghyun","year":"2022","unstructured":"Sunghyun Park, Salar Latifi, Yongjun Park, Armand Behroozi, Byungsoo Jeon, and Scott Mahlke. 2022. SRTuner: Effective compiler optimization customization by exposing synergistic relations. In 2022 IEEE\/ACM International Symposium on Code Generation and Optimization (CGO\u201922). IEEE, 118\u2013130."},{"key":"e_1_3_2_26_2","doi-asserted-by":"crossref","first-page":"202","DOI":"10.1007\/978-3-319-78133-4_15","volume-title":"Artificial Evolution: 13th International Conference, \u00c9volution Artificielle, EA 2017, Paris, France, October 25\u201327, 2017, Revised Selected Papers 13","author":"C\u00e1ceres Leslie P\u00e9rez","year":"2018","unstructured":"Leslie P\u00e9rez C\u00e1ceres, Federico Pagnozzi, Alberto Franzin, and Thomas St\u00fctzle. 2018. Automatic configuration of GCC using irace. In Artificial Evolution: 13th International Conference, \u00c9volution Artificielle, EA 2017, Paris, France, October 25\u201327, 2017, Revised Selected Papers 13. Springer, 202\u2013216."},{"key":"e_1_3_2_27_2","doi-asserted-by":"crossref","first-page":"250","DOI":"10.1145\/3274783.3274842","volume-title":"Proceedings of the 16th ACM Conference on Embedded Networked Sensor Systems","author":"Reijers Niels","year":"2018","unstructured":"Niels Reijers and Chi-Sheng Shih. 2018. CapeVM: A safe and fast virtual machine for resource-constrained Internet-of-Things devices. In Proceedings of the 16th ACM Conference on Embedded Networked Sensor Systems. 250\u2013263."},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1145\/3543507.3583235"},{"key":"e_1_3_2_29_2","first-page":"455","volume-title":"2022 USENIX Annual Technical Conference (USENIX ATC\u201922)","author":"Sang Fan","year":"2022","unstructured":"Fan Sang, Ming-Wei Shih, Sangho Lee, Xiaokuan Zhang, Michael Steiner, Mona Vij, and Taesoo Kim. 2022. \\(\\lbrace\\) PRIDWEN \\(\\rbrace\\) : Universally hardening \\(\\lbrace\\) SGX \\(\\rbrace\\) programs via \\(\\lbrace\\) Load-Time \\(\\rbrace\\) synthesis. In 2022 USENIX Annual Technical Conference (USENIX ATC\u201922). 455\u2013472."},{"key":"e_1_3_2_30_2","unstructured":"John Schulman Filip Wolski Prafulla Dhariwal Alec Radford and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv:1707.06347. Retrieved from https:\/\/arxiv.org\/abs\/1707.06347"},{"key":"e_1_3_2_31_2","doi-asserted-by":"crossref","first-page":"13","DOI":"10.1109\/LLVM-HPC56686.2022.00007","volume-title":"2022 IEEE\/ACM Eighth Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC\u201922)","author":"Shahzad Hafsah","year":"2022","unstructured":"Hafsah Shahzad, Ahmed Sanaullah, Sanjay Arora, Robert Munafo, Xiteng Yao, Ulrich Drepper, and Martin Herbordt. 2022. Reinforcement learning strategies for compiler optimization in high level synthesis. In 2022 IEEE\/ACM Eighth Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC\u201922). IEEE, 13\u201322."},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/IPSN61024.2024.00021"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1145\/3563311"},{"key":"e_1_3_2_34_2","first-page":"1","volume-title":"2022 11th Mediterranean Conference on Embedded Computing (MECO\u201922)","author":"Wallentowitz Stefan","year":"2022","unstructured":"Stefan Wallentowitz, Bastian Kersting, and Dan Mihai Dumitriu. 2022. Potential of WebAssembly for embedded systems. In 2022 11th Mediterranean Conference on Embedded Computing (MECO\u201922). IEEE, 1\u20134."},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.1145\/3497776.3517769"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/INFOCOM48880.2022.9796958"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.1145\/3487552.3487827"}],"container-title":["ACM Transactions on Internet Technology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3731451","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3731451","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:57:19Z","timestamp":1750298239000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3731451"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,5,23]]},"references-count":36,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2025,5,31]]}},"alternative-id":["10.1145\/3731451"],"URL":"https:\/\/doi.org\/10.1145\/3731451","relation":{},"ISSN":["1533-5399","1557-6051"],"issn-type":[{"type":"print","value":"1533-5399"},{"type":"electronic","value":"1557-6051"}],"subject":[],"published":{"date-parts":[[2025,5,23]]},"assertion":[{"value":"2024-12-15","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-04-04","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-05-23","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}