{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,31]],"date-time":"2025-10-31T07:59:26Z","timestamp":1761897566638,"version":"3.41.0"},"reference-count":38,"publisher":"Association for Computing Machinery (ACM)","issue":"5s","license":[{"start":{"date-parts":[[2021,9,17]],"date-time":"2021-09-17T00:00:00Z","timestamp":1631836800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Embed. Comput. Syst."],"published-print":{"date-parts":[[2021,10,31]]},"abstract":"<jats:p>Model compression through quantization is commonly applied to convolutional neural networks (CNNs) deployed on compute and memory-constrained embedded platforms. Different layers of the CNN can have varying degrees of numerical precision for both weights and activations, resulting in a large search space. Together with the hardware (HW) design space, the challenge of finding the globally optimal HW-CNN combination for a given application becomes daunting. To this end, we propose HW-FlowQ, a systematic approach that enables the co-design of the target hardware platform and the compressed CNN model through quantization. The search space is viewed at three levels of abstraction, allowing for an iterative approach for narrowing down the solution space before reaching a high-fidelity CNN hardware modeling tool, capable of capturing the effects of mixed-precision quantization strategies on different hardware architectures (processing unit counts, memory levels, cost models, dataflows) and two types of computation engines (bit-parallel vectorized, bit-serial). To combine both worlds, a multi-objective non-dominated sorting genetic algorithm (NSGA-II) is leveraged to establish a Pareto-optimal set of quantization strategies for the target HW-metrics at each abstraction level. HW-FlowQ detects optima in a discrete search space and maximizes the task-related accuracy of the underlying CNN while minimizing hardware-related costs. The Pareto-front approach keeps the design space open to a range of non-dominated solutions before refining the design to a more detailed level of abstraction. With equivalent prediction accuracy, we improve the energy and latency by 20% and 45% respectively for ResNet56 compared to existing mixed-precision search methods.<\/jats:p>","DOI":"10.1145\/3476997","type":"journal-article","created":{"date-parts":[[2021,9,17]],"date-time":"2021-09-17T18:36:51Z","timestamp":1631903811000},"page":"1-25","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":10,"title":["HW-FlowQ: A Multi-Abstraction Level HW-CNN Co-design Quantization Methodology"],"prefix":"10.1145","volume":"20","author":[{"given":"Nael","family":"Fasfous","sequence":"first","affiliation":[{"name":"Technical University of Munich, Munich, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Manoj Rohit","family":"Vemparala","sequence":"additional","affiliation":[{"name":"BMW Autonomous Driving, Munich, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Alexander","family":"Frickenstein","sequence":"additional","affiliation":[{"name":"BMW Autonomous Driving, Munich, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Emanuele","family":"Valpreda","sequence":"additional","affiliation":[{"name":"Politecnico di Torino, Turin, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Driton","family":"Salihu","sequence":"additional","affiliation":[{"name":"Technical University of Munich, Munich, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Nguyen Anh Vu","family":"Doan","sequence":"additional","affiliation":[{"name":"Technical University of Munich, Munich, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Christian","family":"Unger","sequence":"additional","affiliation":[{"name":"BMW Autonomous Driving, Munich, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Naveen Shankar","family":"Nagaraja","sequence":"additional","affiliation":[{"name":"BMW Autonomous Driving, Munich, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Maurizio","family":"Martina","sequence":"additional","affiliation":[{"name":"Politecnico di Torino, Turin, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Walter","family":"Stechele","sequence":"additional","affiliation":[{"name":"Technical University of Munich, Munich, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2021,9,17]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.5555\/3437539.3437731"},{"key":"e_1_2_1_2_1","volume-title":"Estimating or propagating gradients through stochastic neurons for conditional computation. abs\/1308.3432","author":"Bengio Yoshua","year":"2013","unstructured":"Yoshua Bengio , Nicholas L\u00e9onard , and Aaron Courville . 2013. Estimating or propagating gradients through stochastic neurons for conditional computation. abs\/1308.3432 ( 2013 ). arXiv:1308.3432 Yoshua Bengio, Nicholas L\u00e9onard, and Aaron Courville. 2013. Estimating or propagating gradients through stochastic neurons for conditional computation. abs\/1308.3432 (2013). arXiv:1308.3432"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/3242897"},{"key":"e_1_2_1_4_1","volume-title":"Rethinking atrous convolution for semantic image segmentation. abs\/1706.05587","author":"Chen Liang-Chieh","year":"2017","unstructured":"Liang-Chieh Chen , George Papandreou , Florian Schroff , and Hartwig Adam . 2017. Rethinking atrous convolution for semantic image segmentation. abs\/1706.05587 ( 2017 ). arXiv:1706.05587v3 Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartwig Adam. 2017. Rethinking atrous convolution for semantic image segmentation. abs\/1706.05587 (2017). arXiv:1706.05587v3"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2016.40"},{"key":"e_1_2_1_6_1","volume-title":"Vijayalakshmi Srinivasan, and Kailash Gopalakrishnan.","author":"Choi Jungwook","year":"2018","unstructured":"Jungwook Choi , Zhuo Wang , Swagath Venkataramani , Pierce I-Jen Chuang , Vijayalakshmi Srinivasan, and Kailash Gopalakrishnan. 2018 . PACT : Parameterized clipping activation for quantized neural networks. ArXiv abs\/1805.06085 (2018). arXiv:1805.06085 Jungwook Choi, Zhuo Wang, Swagath Venkataramani, Pierce I-Jen Chuang, Vijayalakshmi Srinivasan, and Kailash Gopalakrishnan. 2018. PACT: Parameterized clipping activation for quantized neural networks. ArXiv abs\/1805.06085 (2018). arXiv:1805.06085"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.350"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.01166"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/4235.996017"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00038"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.5555\/3408352.3408731"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.5555\/3437539.3437589"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW.2019.00175"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/3037697.3037702"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/3297858.3304014"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01234-2_48"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISSCC.2014.6757323"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/WACV.2018.00083"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.5555\/3122009.3242044"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2020.2986127"},{"volume-title":"Learning multiple layers of features from tiny images. (2009)","author":"Krizhevsky Alex","key":"e_1_2_1_22_1","unstructured":"Alex Krizhevsky . 2009. Learning multiple layers of features from tiny images. (2009) . University of Toronto . Alex Krizhevsky. 2009. Learning multiple layers of features from tiny images. (2009). University of Toronto."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/3065386"},{"key":"e_1_2_1_24_1","volume-title":"NeurIPS Workshop.","author":"Lin Yujun","year":"2019","unstructured":"Yujun Lin , Driss Hafdi , Kuan Wang , Zhijian Liu , and Song Han . 2019 . Neural-hardware architecture search . In NeurIPS Workshop. Yujun Lin, Driss Hafdi, Kuan Wang, Zhijian Liu, and Song Han. 2019. Neural-hardware architecture search. In NeurIPS Workshop."},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/TVLSI.2018.2815603"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/3285017.3285019"},{"key":"e_1_2_1_27_1","doi-asserted-by":"crossref","unstructured":"G. De Micheli A. Sangiovanni-Vincentelli and P. Antognetti. 1987. Design Systems for VLSI Circuits: Logic Synthesis and Silicon Compilation. Springer Netherlands.  G. De Micheli A. Sangiovanni-Vincentelli and P. Antognetti. 1987. Design Systems for VLSI Circuits: Logic Synthesis and Silicon Compilation. Springer Netherlands.","DOI":"10.1007\/978-94-009-3649-2"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2019.00042"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46493-0_32"},{"key":"e_1_2_1_30_1","unstructured":"Baidu Research. [n.d.]. DeepBench. https:\/\/github.com\/baidu-research\/DeepBench.  Baidu Research. [n.d.]. DeepBench. https:\/\/github.com\/baidu-research\/DeepBench."},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-015-0816-y"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/3195970.3196072"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.5555\/559628"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00881"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00215"},{"key":"e_1_2_1_36_1","volume-title":"Mixed precision quantization of convnets via differentiable neural architecture search. abs\/1812.00090","author":"Wu Bichen","year":"2018","unstructured":"Bichen Wu , Yanghan Wang , Peizhao Zhang , Yuandong Tian , Peter Vajda , and Kurt Keutzer . 2018. Mixed precision quantization of convnets via differentiable neural architecture search. abs\/1812.00090 ( 2018 ). arXiv:1812.00090 Bichen Wu, Yanghan Wang, Peizhao Zhang, Yuandong Tian, Peter Vajda, and Kurt Keutzer. 2018. Mixed precision quantization of convnets via differentiable neural architecture search. abs\/1812.00090 (2018). arXiv:1812.00090"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/3373376.3378514"},{"key":"e_1_2_1_38_1","volume-title":"DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients. abs\/1606.06160","author":"Zhou Shuchang","year":"2016","unstructured":"Shuchang Zhou , Yuxin Wu , Zekun Ni , Xinyu Zhou , He Wen , and Yuheng Zou . 2016. DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients. abs\/1606.06160 ( 2016 ). arXiv:1606.06160 Shuchang Zhou, Yuxin Wu, Zekun Ni, Xinyu Zhou, He Wen, and Yuheng Zou. 2016. DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients. abs\/1606.06160 (2016). arXiv:1606.06160"}],"container-title":["ACM Transactions on Embedded Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3476997","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3476997","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:30:46Z","timestamp":1750188646000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3476997"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,9,17]]},"references-count":38,"journal-issue":{"issue":"5s","published-print":{"date-parts":[[2021,10,31]]}},"alternative-id":["10.1145\/3476997"],"URL":"https:\/\/doi.org\/10.1145\/3476997","relation":{},"ISSN":["1539-9087","1558-3465"],"issn-type":[{"type":"print","value":"1539-9087"},{"type":"electronic","value":"1558-3465"}],"subject":[],"published":{"date-parts":[[2021,9,17]]},"assertion":[{"value":"2021-04-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-07-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-09-17","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}