{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,1]],"date-time":"2026-06-01T23:20:33Z","timestamp":1780356033060,"version":"3.54.1"},"reference-count":49,"publisher":"Association for Computing Machinery (ACM)","issue":"3","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2025,9,30]]},"abstract":"<jats:p>Path planning is a critical task for autonomous driving, aiming to generate smooth, collision-free, and feasible paths based on input perception and localization information. The planning task is both highly time-sensitive and computationally intensive, posing significant challenges to resource-constrained autonomous driving hardware. In this article, we propose an end-to-end framework for accelerating path planning on FPGA platforms. This framework focuses on accelerating quadratic programming (QP) solving, which is the core of optimization-based path planning and has the most computationally-intensive workloads. Our method leverages a hardware-friendly alternating direction method of multipliers (ADMM) to solve QP problems while employing a highly parallelizable preconditioned conjugate gradient (PCG) method for solving the associated linear systems. We analyze the sparse patterns of matrix operations in QP and design customized storage schemes along with efficient sparse matrix multiplication and sparse matrix-vector multiplication units. Our customized design significantly reduces resource consumption for data storage and computation while dramatically speeding up matrix operations. Additionally, we propose a multi-level dataflow optimization strategy. Within individual operators, we achieve acceleration through parallelization and pipelining. For different operators in an algorithm, we analyze inter-operator data dependencies to enable fine-grained pipelining. At the system level, we map different steps of the planning process to the CPU and FPGA and pipeline these steps to enhance end-to-end throughput. We implement and validate our design on the AMD ZCU102 platform. Our implementation achieves state-of-the-art performance in both latency and energy efficiency compared with existing works, including an average 1.48\u00d7 speedup over the best FPGA-based design, a 2.89\u00d7 speedup compared with the state-of-the-art QP solver on an Intel i7-11800H CPU, a 5.62\u00d7 speedup over an ARM Cortex-A57 embedded CPU, and a 1.56\u00d7 speedup over state-of-the-art GPU-based work. Furthermore, our design delivers a 2.05\u00d7 improvement in throughput compared with the state-of-the-art FPGA-based design.<\/jats:p>","DOI":"10.1145\/3750449","type":"journal-article","created":{"date-parts":[[2025,7,24]],"date-time":"2025-07-24T11:23:34Z","timestamp":1753356214000},"page":"1-25","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["A Sparsity-Aware Autonomous Path Planning Accelerator with HW\/SW Co-Design and Multi-Level Dataflow Optimization"],"prefix":"10.1145","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0009-0002-7315-8151","authenticated-orcid":false,"given":"Yifan","family":"Zhang","sequence":"first","affiliation":[{"name":"Department of Electrical Engineering and Computer Science, University of California Irvine","place":["Irvine, United States"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-6510-8241","authenticated-orcid":false,"given":"Xiaoyu","family":"Niu","sequence":"additional","affiliation":[{"name":"Beijing Institute of Technology","place":["Beijing, China"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-6253-5059","authenticated-orcid":false,"given":"Hongzheng","family":"Tian","sequence":"additional","affiliation":[{"name":"University of California Irvine","place":["Irvine, United States"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3095-625X","authenticated-orcid":false,"given":"Yanjun","family":"Zhang","sequence":"additional","affiliation":[{"name":"Beijing Institute of Technology","place":["Beijing, China"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0139-3622","authenticated-orcid":false,"given":"Bo","family":"Yu","sequence":"additional","affiliation":[{"name":"Shenzhen Institute of Artificial Intelligence and Robotics for Society","place":["Shenzhen, China"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5132-8351","authenticated-orcid":false,"given":"Shaoshan","family":"Liu","sequence":"additional","affiliation":[{"name":"Shenzhen Institute of Artificial Intelligence and Robotics for Society","place":["Shenzhen, China"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7669-1467","authenticated-orcid":false,"given":"Sitao","family":"Huang","sequence":"additional","affiliation":[{"name":"Electrical Engineering and Computer Science, University of California Irvine Henry Samueli School of Engineering","place":["Irvine, United States"]}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2025,9,19]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-01802-2"},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2019.2915983"},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1145\/3620665.3640379"},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1145\/3647638"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO50266.2020.00089"},{"key":"e_1_3_1_7_2","unstructured":"Baidu Inc. Baidu apollo. https:\/\/github.com\/ApolloAuto\/apollo\/. Accessed: 2025-08-08."},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9780511546877"},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.1109\/MCAS.2021.3071609"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/MC.2020.2970924"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1016\/0898-1221(76)90003-1"},{"key":"e_1_3_1_12_2","unstructured":"Haoyang Fan Fan Zhu Changchun Liu Liangliang Zhang Li Zhuang Dong Li Weicheng Zhu Jiangtao Hu Hongye Li and Qi Kong. 2018. Baidu Apollo EM Motion Planner. arXiv preprint arXiv:1807.08048."},{"key":"e_1_3_1_13_2","doi-asserted-by":"crossref","unstructured":"Alessandro Gasparetto Paolo Boscariol Albano Lanzutti and Renato Vidoni. 2015. Path planning and trajectory planning algorithms: A general overview. In Motion and Operation Planning of Robotic Systems: Background and Practical Approaches Mechanisms and Machine Science Giuseppe Carbone and Fernando G\u00f3mez-Bravo (Eds.). 29 (2015) 3\u201327.","DOI":"10.1007\/978-3-319-14705-5_1"},{"issue":"48105","key":"e_1_3_1_14_2","first-page":"18","article-title":"Practical search techniques in path planning for autonomous driving","volume":"1001","author":"Dolgov Dmitri","year":"2008","unstructured":"Dmitri Dolgov, Sebastian Thrun, Michael Montemerlo, and James Diebel. 2008. Practical search techniques in path planning for autonomous driving. Ann Arbor 1001, 48105 (2008), 18\u201380.","journal-title":"Ann Arbor"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10846-019-01112-z"},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/IRDS.2002.1041624"},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2011.5980479"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.14569\/IJACSA.2016.071114"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/TASE.2020.2976560"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.2514\/2.4231"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1145\/1553374.1553508"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2016.2527062"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/TRO.2018.2878996"},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.23919\/ACC50511.2021.9483020"},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1007\/s12532-014-0071-1"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1145\/641876.641880"},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1007\/s12532-020-00179-2"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/IV47402.2020.9304787"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/FPT.2017.8280159"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2019.00024"},{"key":"e_1_3_1_31_2","first-page":"1","volume-title":"2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS)","author":"Wan Zishen","unstructured":"Zishen Wan, Yuyang Zhang, Arijit Raychowdhury, Bo Yu, Yanjun Zhang, and Shaoshan Liu. An energy-efficient quad-camera visual system for autonomous machines on fpga platform. In 2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS). 1\u20134."},{"key":"e_1_3_1_32_2","first-page":"479","volume-title":"MICRO-54: 54th Annual IEEE\/ACM International Symposium on Microarchitecture","author":"Liu Weizhuang","unstructured":"Weizhuang Liu, Bo Yu, Yiming Gan, Qiang Liu, Jie Tang, Shaoshan Liu, and Yuhao Zhu. Archytas: A framework for synthesizing and dynamically optimizing accelerators for robotic localization. In MICRO-54: 54th Annual IEEE\/ACM International Symposium on Microarchitecture. 479\u2013493."},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA51647.2021.00074"},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.1145\/3508352.3561112"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/CICC53496.2022.9772870"},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/DAC56929.2023.10247780"},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1145\/1950413.1950454"},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.1109\/AICAS54282.2022.9869951"},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2024.3377895"},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1109\/VLSICircuits18222.2020.9162951"},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2015.7139620"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.cad.2009.12.007"},{"key":"e_1_3_1_43_2","unstructured":"Daniel Ruiz. 2001. A scaling algorithm to equilibrate both rows and columns norms in matrices 1. Technical Report RAL-TR-2001-034 Rutherford Appleton Laboratory 2001. Also published as ENSEEIHT-IRIT Technical Report RT\/APO\/01\/4."},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jpdc.2020.05.021"},{"key":"e_1_3_1_45_2","doi-asserted-by":"publisher","DOI":"10.1145\/3676536.3676700"},{"key":"e_1_3_1_46_2","unstructured":"Jiangnan Li. path_optimizer. Retrieved from https:\/\/github.com\/LiJiangnanBit\/path_optimizer. ([n. d.])."},{"key":"e_1_3_1_47_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA57147.2024.10611249"},{"key":"e_1_3_1_48_2","doi-asserted-by":"publisher","DOI":"10.1145\/3579371.3589108"},{"key":"e_1_3_1_49_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO61859.2024.00115"},{"key":"e_1_3_1_50_2","doi-asserted-by":"publisher","DOI":"10.1145\/3543622.3573182"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3750449","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,20]],"date-time":"2025-09-20T00:49:21Z","timestamp":1758329361000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3750449"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,9,19]]},"references-count":49,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2025,9,30]]}},"alternative-id":["10.1145\/3750449"],"URL":"https:\/\/doi.org\/10.1145\/3750449","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"value":"1544-3566","type":"print"},{"value":"1544-3973","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,9,19]]},"assertion":[{"value":"2025-02-08","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-07-03","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-09-19","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}