{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,27]],"date-time":"2026-02-27T16:00:34Z","timestamp":1772208034535,"version":"3.50.1"},"reference-count":46,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2021,12,27]],"date-time":"2021-12-27T00:00:00Z","timestamp":1640563200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["U19B2019, 61832007, 61621091"],"award-info":[{"award-number":["U19B2019, 61832007, 61621091"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100012166","name":"National Key R&D Program of China","doi-asserted-by":"crossref","award":["2018YFB0105005, 2017YFA02077600"],"award-info":[{"award-number":["2018YFB0105005, 2017YFA02077600"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100002858","name":"China Postdoctoral Science Foundation","doi-asserted-by":"crossref","award":["2019M660641"],"award-info":[{"award-number":["2019M660641"]}],"id":[{"id":"10.13039\/501100002858","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Beijing Innovation Center for Future Chips"},{"name":"Tsinghua EE Xilinx AI Research Fund"},{"DOI":"10.13039\/501100017582","name":"Beijing National Research Center for Information Science and Technology","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100017582","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Reconfigurable Technol. Syst."],"published-print":{"date-parts":[[2022,9,30]]},"abstract":"<jats:p>INFerence-as-a-Service (INFaaS) has become a primary workload in the cloud. However, existing FPGA-based Deep Neural Network (DNN) accelerators are mainly optimized for the fastest speed of a single task, while the multi-tenancy of INFaaS has not been explored yet. As the demand for INFaaS keeps growing, simply increasing the number of FPGA-based DNN accelerators is not cost-effective, while merely sharing these single-task optimized DNN accelerators in a time-division multiplexing way could lead to poor isolation and high-performance loss for INFaaS. On the other hand, current cloud-based DNN accelerators have excessive compilation overhead, especially when scaling out to multi-FPGA systems for multi-tenant sharing, leading to unacceptable compilation costs for both offline deployment and online reconfiguration. Therefore, it is far from providing efficient and flexible FPGA virtualization for public and private cloud scenarios.<\/jats:p>\n          <jats:p>Aiming to solve these problems, we propose a unified virtualization framework for general-purpose deep neural networks in the cloud, enabling multi-tenant sharing for both the Convolution Neural Network (CNN), and the Recurrent Neural Network (RNN) accelerators on a single FPGA. The isolation is enabled by introducing a two-level instruction dispatch module and a multi-core based hardware resources pool. Such designs provide isolated and runtime-programmable hardware resources, which further leads to performance isolation for multi-tenant sharing. On the other hand, to overcome the heavy re-compilation overheads, a tiling-based instruction frame package design and a two-stage static-dynamic compilation, are proposed. Only the lightweight runtime information is re-compiled with \u223c1 ms overhead, thus guaranteeing the private cloud\u2019s performance. Finally, the extensive experimental results show that the proposed virtualized solutions achieve up to 3.12\u00d7 and 6.18\u00d7 higher throughput in the private cloud compared with the static CNN and RNN baseline designs, respectively.<\/jats:p>","DOI":"10.1145\/3480170","type":"journal-article","created":{"date-parts":[[2021,12,28]],"date-time":"2021-12-28T06:37:50Z","timestamp":1640673470000},"page":"1-31","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":8,"title":["A Unified FPGA Virtualization Framework for General-Purpose Deep Neural Networks in the Cloud"],"prefix":"10.1145","volume":"15","author":[{"given":"Shulin","family":"Zeng","sequence":"first","affiliation":[{"name":"Tsinghua University, Beijing, China"}]},{"given":"Guohao","family":"Dai","sequence":"additional","affiliation":[{"name":"Tsinghua University, Beijing, China"}]},{"given":"Hanbo","family":"Sun","sequence":"additional","affiliation":[{"name":"Tsinghua University, Beijing, China"}]},{"given":"Jun","family":"Liu","sequence":"additional","affiliation":[{"name":"Tsinghua University, Beijing, China"}]},{"given":"Shiyao","family":"Li","sequence":"additional","affiliation":[{"name":"Tsinghua University, Beijing, China"}]},{"given":"Guangjun","family":"Ge","sequence":"additional","affiliation":[{"name":"Tsinghua University, Beijing, China"}]},{"given":"Kai","family":"Zhong","sequence":"additional","affiliation":[{"name":"Tsinghua University, Beijing, China"}]},{"given":"Kaiyuan","family":"Guo","sequence":"additional","affiliation":[{"name":"Tsinghua University, Beijing, China"}]},{"given":"Yu","family":"Wang","sequence":"additional","affiliation":[{"name":"Tsinghua University, Beijing, China"}]},{"given":"Huazhong","family":"Yang","sequence":"additional","affiliation":[{"name":"Tsinghua University, Beijing, China"}]}],"member":"320","published-online":{"date-parts":[[2021,12,27]]},"reference":[{"key":"e_1_3_1_2_2","first-page":"265","volume-title":"Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation","author":"Abadi Mart\u00edn","year":"2016","unstructured":"Mart\u00edn Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. Tensorflow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation. USENIX, 265\u2013283."},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/FPL.2018.00077"},{"key":"e_1_3_1_4_2","unstructured":"Alibaba. 2019. Alibaba F1. Retrieved December 20 2020 from https:\/\/www.aliyun.com\/product\/ecs\/fpga?spm=5176.224200.100.29.813f6ed6OuUlZ2&aly_as=x0_o5Br."},{"key":"e_1_3_1_5_2","unstructured":"Amazon. 2019. AWS F1. Retrieved December 20 2020 from https:\/\/aws.amazon.com\/ec2\/instance-types\/f1\/."},{"key":"e_1_3_1_6_2","first-page":"173","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Amodei Dario","year":"2016","unstructured":"Dario Amodei, Sundaram Ananthanarayanan, Rishita Anubhai, Jingliang Bai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catanzaro, Qiang Cheng, Guoliang Chen, Jie Chen, Jingdong Chen, Zhijie Chen, Mike Chrzanowski, Adam Coates, Greg Diamos, Ke Ding, Niandong Du, Erich Elsen, Jesse Engel, Weiwei Fang, Linxi Fan, Christopher Fougner, Liang Gao, Caixia Gong, Awni Hannun, Tony Han, Lappi Johannes, Bing Jiang, Cai Ju, Billy Jun, Patrick LeGresley, Libby Lin, Junjie Liu, Yang Liu, Weigao Li, Xiangang Li, Dongpeng Ma, Sharan Narang, Andrew Ng, Sherjil Ozair, Yiping Peng, Ryan Prenger, Sheng Qian, Zongfeng Quan, Jonathan Raiman, Vinay Rao, Sanjeev Satheesh, David Seetapun, Shubho Sengupta, Kavya Srinet, Anuroop Sriram, Haiyuan Tang, Liliang Tang, Chong Wang, Jidong Wang, Kaifu Wang, Yi Wang, Zhijian Wang, Zhiqian Wang, Shuang Wu, Likai Wei, Bo Xiao, Wen Xie, Yan Xie, Dani Yogatama, Bin Yuan, Jun Zhan, Zhenyao Zhu. 2016. Deep speech 2: End-to-end speech recognition in english and mandarin. In Proceedings of the International Conference on Machine Learning. 173\u2013182."},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1145\/3020078.3021738"},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2012.25"},{"key":"e_1_3_1_9_2","first-page":"1","volume-title":"IEEE International Symposium on Circuits and Systems (ISCAS)","author":"Gokhale Vinayak","year":"2017","unstructured":"Vinayak Gokhale, Aliasger Zaidy, Andre Xian Ming Chang, and Eugenio Culurciello. 2017. Snowflake: An efficient hardware accelerator for convolutional neural networks. In IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, 1\u20134."},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1145\/2597917.2597929"},{"key":"e_1_3_1_11_2","first-page":"578","volume-title":"Proceedings of the USENIX Symposium on Operating Systems Design and Implementation","author":"Chen Tianqi","year":"2018","unstructured":"Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. TVM: An automated end-to-end optimizing compiler for deep learning. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation. USENIX, 578\u2013594."},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1145\/3289602.3293915"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA47549.2020.00027"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2014.12"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/FPT.2014.7082811"},{"key":"e_1_3_1_16_2","first-page":"51","volume-title":"Proceedings of the USENIX Symposium on Networked Systems Design and Implementation","year":"2018","unstructured":"Daniel Firestone, Andrew Putnam, Sambhrama Mundkur, Derek Chiou, Alireza Dabagh, Mike Andrewartha, Hari Angepat, Vivek Bhanu, Adrian Caulfield, Eric Chung, Harish Kumar Chandrappa, Somesh Chaturmohta, Matt Humphrey, Jack Lavier, Norman Lam, Fengfen Liu, Kalin Ovtcharov, Jitu Padhye, Gautham Popuri, Shachar Raindel, Tejas Sapre, Mark Shaw, Gabriel Silva, Madhan Sivakumar, Nisheeth Srivastava, Anshuman Verma, Qasim Zuhair, Deepak Bansal, Doug Burger, Kushagra Vaid, David A. Maltz, and Albert Greenberg. 2018. Azure accelerated networking: SmartNICs in the public cloud. In Proceedings of the USENIX Symposium on Networked Systems Design and Implementation. USENIX, 51\u201366."},{"key":"e_1_3_1_17_2","first-page":"1","volume-title":"Proceedings of the Annual International Symposium on Computer Architecture","year":"2018","unstructured":"Jeremy Fowers, Kalin Ovtcharov, Michael Papamichael, Todd Massengill, Ming Liu, Daniel Lo, Shlomi Alkalay, Michael Haselman, Logan Adams, Mahdi Ghandi, Stephen Heil, Prerak Patel, Adam Sapek, Gabriel Weisz, Lisa Woods, Sitaram Lanka, Steven K. Reinhardt, Adrian M. Caulfield, Eric S. Chung, and Doug Burger. 2018. A configurable cloud-scale DNN processor for real-time AI. In Proceedings of the Annual International Symposium on Computer Architecture. IEEE, 1\u201314."},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2017.2705069"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1145\/3289185"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA45697.2020.00084"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1145\/3020078.3021745"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_1_23_2","unstructured":"Andrew G. Howard Menglong Zhu Bo Chen Dmitry Kalenichenko Weijun Wang Tobias Weyand Marco Andreetto and Hartwig Adam. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861. Retrieved from https:\/\/arxiv.org\/abs\/1704.04861."},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/RTAS.2019.00011"},{"key":"e_1_3_1_25_2","unstructured":"Andy Jassy. 2018. AWS re:Invent 2018. Retrieved December 20 2020 from https:\/\/www.youtube.com\/watch?v=ZOIkOnW640A."},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/RTSS46320.2019.00037"},{"key":"e_1_3_1_27_2","first-page":"107","volume-title":"Proceedings of the 13th  \\lbrace USENIX \\rbrace  Symposium on Operating Systems Design and Implementation","author":"Khawaja Ahmed","year":"2018","unstructured":"Ahmed Khawaja, Joshua Landgraf, Rohith Prakash, Michael Wei, Eric Schkufza, and Christopher J. Rossbach. 2018. Sharing, protection, and compatibility for reconfigurable fabric with amorphos. In Proceedings of the 13th \\lbrace USENIX \\rbrace Symposium on Operating Systems Design and Implementation. 107\u2013127."},{"key":"e_1_3_1_28_2","unstructured":"Oliver Knodel Paul R. Genssler and Rainer G. Spallek. 2017. Virtualizing reconfigurable hardware to provide scalability in cloud architectures. In International Conference on Advances in Circuits Electronics and Micro-electronics (CENICS) . 33\u201338."},{"key":"e_1_3_1_29_2","volume-title":"Proceedings of the 16th Design, Automation & Test in Europe Conference and Exhibition","author":"Kotaba Ondrej","year":"2013","unstructured":"Ondrej Kotaba, Jan Nowotsch, Michael Paulitsch, Stefan M. Petters, and Henrik Theiling. 2013. Multicore in real-time systems\u2013temporal isolation challenges due to shared resources. In Proceedings of the 16th Design, Automation & Test in Europe Conference and Exhibition."},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2020.2974843"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2019.00035"},{"key":"e_1_3_1_32_2","unstructured":"NVIDIA. 2019. CUDA multi process service overview. Retrieved December 20 2020 from https:\/\/docs.nvidia.com\/deploy\/pdf\/CUDA_Multi_Process_Service_Overview.pdf."},{"key":"e_1_3_1_33_2","unstructured":"Nvidia. 2020. Nvidia A100 tensor core GPU architecture. Retrieved December 20 2020 from https:\/\/www.nvidia.cn\/data-center\/a100\/."},{"key":"e_1_3_1_34_2","volume-title":"Proceedings of the IEEE Hot Chips Symposium","author":"Ouyang Jian","year":"2017","unstructured":"Jian Ouyang. 2017. XPU: A programmable FPGA accelerator for diverse workloads. In Proceedings of the IEEE Hot Chips Symposium."},{"key":"e_1_3_1_35_2","volume-title":"Proceedings of the IEEE Hot Chips Symposium","author":"Ouyang Jian","year":"2016","unstructured":"Jian Ouyang, Wei Qi, Wang Yong, Yichen Tu, Jing Wang, and Bowen Jia. 2016. SDA: Software-defined accelerator for general-purpose distributed big data analysis system. In Proceedings of the IEEE Hot Chips Symposium. IEEE."},{"key":"e_1_3_1_36_2","unstructured":"Jongsoo Park Maxim Naumov Protonu Basu Summer Deng Aravind Kalaiah Daya Khudia James Law Parth Malani Andrey Malevich Satish Nadathur Juan Pino Martin Schatz Alexander Sidorov Viswanath Sivakumar Andrew Tulloch Xiaodong Wang Yiming Wu Hector Yuen Utku Diril Dmytro Dzhulgakov Kim Hazelwood Bill Jia Yangqing Jia Lin Qiao Vijay Rao Nadav Rotem Sungjoo Yoo and Mikhail Smelyanskiy. 2018. Deep learning inference in facebook data centers: Characterization performance optimizations and hardware implications. arXiv:1811.09886. Retrieved from https:\/\/arxiv.org\/abs\/1811.09886."},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1145\/2678373.2665678"},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.1109\/DAC18072.2020.9218652"},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1109\/FPL.2018.00024"},{"key":"e_1_3_1_40_2","unstructured":"Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556. Retrieved from https:\/\/arxiv.org\/abs\/1409.1556."},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.308"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/FPL.2018.00031"},{"key":"e_1_3_1_43_2","unstructured":"Xilinx. 2018. Accelerating DNNs with Xilinx Alveo Accelerator Cards. Retrieved December 20 2020 from https:\/\/www.xilinx.com\/applications\/megatrends\/machine-learning.html."},{"key":"e_1_3_1_44_2","unstructured":"Xilinx. 2019. AXI Interconnect IP. Retrieved from https:\/\/www.xilinx.com\/products\/intellectual-property\/axi_interconnect.html."},{"key":"e_1_3_1_45_2","article-title":"DNNVM: End-to-end compiler leveraging heterogeneous optimizations on FPGA-based CNN accelerators","author":"Xing Yu","year":"2019","unstructured":"Yu Xing, Shuang Liang, Lingzhi Sui, Xijie Jia, Jiantao Qiu, Xin Liu, Yushun Wang, Yi Shan, and Yu Wang. 2019. DNNVM: End-to-end compiler leveraging heterogeneous optimizations on FPGA-based CNN accelerators. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 39, 10 (2019), 2668\u20132681.","journal-title":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems"},{"key":"e_1_3_1_46_2","article-title":"Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networks","author":"Zhang Chen","year":"2018","unstructured":"Chen Zhang, Guangyu Sun, Zhenman Fang, Peipei Zhou, Peichen Pan, and Jason Cong. 2018. Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networks. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 38, 11 (2018), 2072\u20132085.","journal-title":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems"},{"key":"e_1_3_1_47_2","doi-asserted-by":"publisher","DOI":"10.1145\/3240765.3240801"}],"container-title":["ACM Transactions on Reconfigurable Technology and Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3480170","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3480170","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:31:16Z","timestamp":1750188676000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3480170"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,12,27]]},"references-count":46,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2022,9,30]]}},"alternative-id":["10.1145\/3480170"],"URL":"https:\/\/doi.org\/10.1145\/3480170","relation":{},"ISSN":["1936-7406","1936-7414"],"issn-type":[{"value":"1936-7406","type":"print"},{"value":"1936-7414","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,12,27]]},"assertion":[{"value":"2021-01-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-08-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-12-27","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}