{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,10]],"date-time":"2025-12-10T08:56:40Z","timestamp":1765357000915,"version":"3.41.0"},"reference-count":48,"publisher":"Association for Computing Machinery (ACM)","issue":"5s","license":[{"start":{"date-parts":[[2021,9,17]],"date-time":"2021-09-17T00:00:00Z","timestamp":1631836800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"CAPES - Brasil - Finance Code","award":["001"],"award-info":[{"award-number":["001"]}]},{"name":"FAPERGS and CNPq"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Embed. Comput. Syst."],"published-print":{"date-parts":[[2021,10,31]]},"abstract":"<jats:p>FPGAs, because of their energy efficiency, reconfigurability, and easily tunable HLS designs, have been used to accelerate an increasing number of machine learning, especially CNN-based, applications. As a representative example, IoT Edge applications, which require low latency processing of resource-hungry CNNs, offload the inferences from resource-limited IoT end nodes to Edge servers featuring FPGAs. However, the ever-increasing number of end nodes pressures these FPGA-based servers with new performance and adaptability challenges. While some works have exploited CNN optimizations to alleviate inferences\u2019 computation and memory burdens, others have exploited HLS to tune accelerators for statically defined optimization goals. However, these works have not tackled both CNN and HLS optimizations altogether; neither have they provided any adaptability at runtime, where the workload\u2019s characteristics are unpredictable. In this context, we propose a hybrid two-step approach that, first, creates new optimization opportunities at design-time through the automatic training of CNN model variants (obtained via pruning) and the automatic generation of versions of convolutional accelerators (obtained during HLS synthesis); and, second, synergistically exploits these created CNN and HLS optimization opportunities to deliver a fully dynamic Multi-FPGA system that adapts its resources in a fully automatic or user-configurable manner. We implement this two-step approach as the AdaServ Framework and show, through a smart video surveillance Edge application as a case study, that it adapts to the always-changing Edge conditions: AdaServ processes at least 3.37\u00d7 more inferences (using the automatic approach) and is at least 6.68\u00d7 more energy-efficient (user-configurable approach) than original convolutional accelerators and CNN Models (VGG-16 and AlexNet). We also show that AdaServ achieves better results than solutions dynamically changing only the CNN model or HLS version, highlighting the importance of exploring both; and that it is always better than the best statically chosen CNN model and HLS version, showing the need for dynamic adaptability.<\/jats:p>","DOI":"10.1145\/3476990","type":"journal-article","created":{"date-parts":[[2021,9,17]],"date-time":"2021-09-17T18:36:51Z","timestamp":1631903811000},"page":"1-26","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":15,"title":["Synergistically Exploiting CNN Pruning and HLS Versioning for Adaptive Inference on Multi-FPGAs at the Edge"],"prefix":"10.1145","volume":"20","author":[{"given":"Guilherme","family":"Korol","sequence":"first","affiliation":[{"name":"Institute of Informatics - Federal University of Rio Grande do Sul, Porto Alegre, Brazil"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Michael Guilherme","family":"Jordan","sequence":"additional","affiliation":[{"name":"Institute of Informatics - Federal University of Rio Grande do Sul, Porto Alegre, Brazil"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mateus Beck","family":"Rutzig","sequence":"additional","affiliation":[{"name":"Electronics and Computing Department - Federal University of Santa Maria, Santa Maria, Brazil"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Antonio Carlos Schneider","family":"Beck","sequence":"additional","affiliation":[{"name":"Institute of Informatics - Federal University of Rio Grande do Sul, Porto Alegre, Brazil"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2021,9,17]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"et\u00a0al","author":"Baskin Chaim","year":"2018","unstructured":"Chaim Baskin , Natan Liss , Evgenii Zheltonozhskii , et\u00a0al . 2018 . Streaming architecture for large-scale quantized neural networks on an FPGA-Based dataflow platform. In IPDPS. IEEE Computer Society , 162\u2013169. Chaim Baskin, Natan Liss, Evgenii Zheltonozhskii, et\u00a0al. 2018. Streaming architecture for large-scale quantized neural networks on an FPGA-Based dataflow platform. In IPDPS. IEEE Computer Society, 162\u2013169."},{"key":"e_1_2_1_2_1","volume-title":"Carlos Arthur Lang Lisb\u00f4a, and Luigi Carro","author":"Beck Antonio Carlos S.","year":"2012","unstructured":"Antonio Carlos S. Beck , Carlos Arthur Lang Lisb\u00f4a, and Luigi Carro . 2012 . Adaptable embedded systems. Springer . Antonio Carlos S. Beck, Carlos Arthur Lang Lisb\u00f4a, and Luigi Carro. 2012. Adaptable embedded systems. Springer."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/2024716.2024718"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/3242897"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/2742647.2742663"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/3301418.3313946"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2019.2921977"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10462-020-09816-7"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.5555\/3154630.3154681"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.5555\/2999792.2999852"},{"key":"e_1_2_1_11_1","volume-title":"et\u00a0al","author":"Fang Biyi","year":"2020","unstructured":"Biyi Fang , Xiao Zeng , Faen Zhang , et\u00a0al . 2020 . FlexDNN: Input-adaptive on-device deep learning for efficient mobile vision. In SEC. IEEE , 84\u201395. Biyi Fang, Xiao Zeng, Faen Zhang, et\u00a0al. 2020. FlexDNN: Input-adaptive on-device deep learning for efficient mobile vision. In SEC. IEEE, 84\u201395."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3241539.3241559"},{"key":"e_1_2_1_13_1","volume-title":"et\u00a0al","author":"Faraone Julian","year":"2018","unstructured":"Julian Faraone , Giulio Gambardella , Nicholas J. Fraser , et\u00a0al . 2018 . Customizing low-precision deep neural networks for FPGAs. In FPL. IEEE Computer Society , 97\u2013100. Julian Faraone, Giulio Gambardella, Nicholas J. Fraser, et\u00a0al. 2018. Customizing low-precision deep neural networks for FPGAs. In FPL. IEEE Computer Society, 97\u2013100."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-56258-2_23"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.5555\/3086952"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.5555\/3454287.3455008"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.5555\/3157096.3157251"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.5555\/2969239.2969366"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/3316781.3317829"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/2872887.2749472"},{"key":"e_1_2_1_22_1","volume-title":"et\u00a0al","author":"Houben Sebastian","year":"2013","unstructured":"Sebastian Houben , Johannes Stallkamp , Jan Salmen , et\u00a0al . 2013 . Detection of Traffic signs in real-world images: The German traffic sign detection benchmark. In IJCNN. IEEE , 1\u20138. Sebastian Houben, Johannes Stallkamp, Jan Salmen, et\u00a0al. 2013. Detection of Traffic signs in real-world images: The German traffic sign detection benchmark. In IJCNN. IEEE, 1\u20138."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/TBC.2008.2001246"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/3366636"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/2647868.2654889"},{"key":"e_1_2_1_26_1","volume-title":"et\u00a0al","author":"Jiang Shuang","year":"2018","unstructured":"Shuang Jiang , Dong He , Chenxi Yang , et\u00a0al . 2018 . Accelerating mobile applications at the network edge with software-programmable FPGAs. In INFOCOM. IEEE , 55\u201362. Shuang Jiang, Dong He, Chenxi Yang, et\u00a0al. 2018. Accelerating mobile applications at the network edge with software-programmable FPGAs. In INFOCOM. IEEE, 55\u201362."},{"key":"e_1_2_1_27_1","volume-title":"et\u00a0al","author":"Jiang Shuang","year":"2020","unstructured":"Shuang Jiang , Zhiyao Ma , Xiao Zeng , et\u00a0al . 2020 . SCYLLA : QoE-aware continuous mobile vision with FPGA-based dynamic deep neural network reconfiguration. In INFOCOM. IEEE , 1369\u20131378. Shuang Jiang, Zhiyao Ma, Xiao Zeng, et\u00a0al. 2020. SCYLLA: QoE-aware continuous mobile vision with FPGA-based dynamic deep neural network reconfiguration. In INFOCOM. IEEE, 1369\u20131378."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/3358192"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2019.2954546"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.5555\/2755753.2755788"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.5555\/2999134.2999257"},{"key":"e_1_2_1_33_1","volume-title":"et\u00a0al","author":"Li Hao","year":"2017","unstructured":"Hao Li , Asim Kadav , Igor Durdanovic , et\u00a0al . 2017 . Pruning filters for efficient convnets. In ICLR. OpenReview .net. Hao Li, Asim Kadav, Igor Durdanovic, et\u00a0al. 2017. Pruning filters for efficient convnets. In ICLR. OpenReview.net."},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/MCOM.2017.1700168"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/TII.2018.2842821"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/3020078.3021740"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2013.257"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.5555\/2971808.2971918"},{"key":"e_1_2_1_39_1","volume-title":"V\u00e9stias","author":"Peres Tiago","year":"2019","unstructured":"Tiago Peres , Ana Gon\u00e7alves , and M\u00e1rio P . V\u00e9stias . 2019 . Faster convolutional neural networks in low density FPGAs using block pruning. In ARC(Lecture Notes in Computer Science , Vol. 11444). Springer, 402\u2013 416 . Tiago Peres, Ana Gon\u00e7alves, and M\u00e1rio P. V\u00e9stias. 2019. Faster convolutional neural networks in low density FPGAs using block pruning. In ARC(Lecture Notes in Computer Science, Vol. 11444). Springer, 402\u2013416."},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-77610-1_23"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-015-0816-y"},{"key":"e_1_2_1_42_1","unstructured":"Karen Simonyan and Andrew Zisserman. 2015. Very Deep convolutional networks for large-scale image recognition. In ICLR.  Karen Simonyan and Andrew Zisserman. 2015. Very Deep convolutional networks for large-scale image recognition. In ICLR."},{"key":"e_1_2_1_43_1","volume-title":"Ardalan Amiri Sani, and Eli Bozorgzadeh","author":"Ting Hsin-Yu","year":"2020","unstructured":"Hsin-Yu Ting , Tootiya Giyahchi , Ardalan Amiri Sani, and Eli Bozorgzadeh . 2020 . Dynamic sharing in multi-accelerators of neural networks on an FPGA edge device. In ASAP. IEEE , 197\u2013204. Hsin-Yu Ting, Tootiya Giyahchi, Ardalan Amiri Sani, and Eli Bozorgzadeh. 2020. Dynamic sharing in multi-accelerators of neural networks on an FPGA edge device. In ASAP. IEEE, 197\u2013204."},{"key":"e_1_2_1_44_1","unstructured":"Xilinx Inc. 2020. A HLS-based Deep Neural Network Accelerator library for Xilinx Ultrascale+ MPSoC devices. https:\/\/github.com\/Xilinx\/CHaiDNN.  Xilinx Inc. 2020. A HLS-based Deep Neural Network Accelerator library for Xilinx Ultrascale+ MPSoC devices. https:\/\/github.com\/Xilinx\/CHaiDNN."},{"key":"e_1_2_1_45_1","unstructured":"Xilinx Inc. 2020. Smart World AI Video Analytics: Real-Time Analytics For A Smarter Safer World. https:\/\/www.xilinx.com\/applications\/data-center\/video-imaging\/v ideo-ai-analytics.html.  Xilinx Inc. 2020. Smart World AI Video Analytics: Real-Time Analytics For A Smarter Safer World. https:\/\/www.xilinx.com\/applications\/data-center\/video-imaging\/v ideo-ai-analytics.html."},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/3316781.3324696"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/3140659.3080215"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/2684746.2689060"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/2789168.2790123"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/2897937.2898040"}],"container-title":["ACM Transactions on Embedded Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3476990","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3476990","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:30:46Z","timestamp":1750188646000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3476990"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,9,17]]},"references-count":48,"journal-issue":{"issue":"5s","published-print":{"date-parts":[[2021,10,31]]}},"alternative-id":["10.1145\/3476990"],"URL":"https:\/\/doi.org\/10.1145\/3476990","relation":{},"ISSN":["1539-9087","1558-3465"],"issn-type":[{"type":"print","value":"1539-9087"},{"type":"electronic","value":"1558-3465"}],"subject":[],"published":{"date-parts":[[2021,9,17]]},"assertion":[{"value":"2021-04-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-07-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-09-17","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}