{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,3]],"date-time":"2026-07-03T09:02:38Z","timestamp":1783069358843,"version":"3.54.6"},"reference-count":39,"publisher":"Association for Computing Machinery (ACM)","issue":"6","funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62090024, 62222411, 62025404, 92373206, 62202453, and ICT.CAS E463010011"],"award-info":[{"award-number":["62090024, 62222411, 62025404, 92373206, 62202453, and ICT.CAS E463010011"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Des. Autom. Electron. Syst."],"published-print":{"date-parts":[[2025,11,30]]},"abstract":"<jats:p>Recent advances in large language models (LLMs) have demonstrated significant potential for automated hardware description language (HDL) code generation from high-level specifications. However, two critical challenges limit further progress in this domain: the scarcity of quality Verilog training data and the inability of current approaches to generate RTL code optimized for power, performance, and area (PPA) metrics.<\/jats:p>\n                  <jats:p>This article presents a comprehensive data-centric framework that addresses these limitations through innovations in both pre-fine-tuning data preparation and after-fine-tuning optimization strategies. In the pre-fine-tuning phase, we tackle the data scarcity problem with an automated design-data augmentation framework that generates high-volume, high-quality natural language specifications aligned with corresponding Verilog code and EDA scripts. Our approach creates a complete RTL-level feedback loop by augmenting EDA scripts, RTL code, and EDA tool feedback. In the after-fine-tuning phase, we focus on generating PPA-aware RTL code through a novel search and prompt framework. Our approach implements iterative filtering and selection of LLM-generated Verilog variants while providing high-quality predefined prompts, including composition and interface specifications.<\/jats:p>\n                  <jats:p>To evaluate the effectiveness of our data augmentation method, we fine-tune Llama 2-13B and Llama 2-7B models using the dataset generated by our augmentation framework. The results demonstrate a significant improvement in the Verilog generation tasks with LLMs. Moreover, the accuracy of Verilog generation surpasses that of the current state-of-the-art open-source Verilog generation model, increasing from 58.8% to 70.6% with the same benchmark. Our 13B model has a pass rate improvement compared with GPT-3.5 in Verilog generation and outperforms in EDA script (i.e., SiliconCompiler) generation with only 200 EDA script data. Additionally, to evaluate the effectiveness of the our agent framework, we compare the PPA on the GPT-3.5, where the results show that the agent refined RTL code can have a better quality than the generated RTL code only with GPT-3.5.<\/jats:p>","DOI":"10.1145\/3727980","type":"journal-article","created":{"date-parts":[[2025,4,3]],"date-time":"2025-04-03T06:58:05Z","timestamp":1743663485000},"page":"1-26","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["A data-centric chip design agent framework for Verilog code generation"],"prefix":"10.1145","volume":"30","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1920-0101","authenticated-orcid":false,"given":"Kaiyan","family":"Chang","sequence":"first","affiliation":[{"name":"SKLP, Institute of Computing Technology Chinese Academy of Sciences","place":["Beijing, China"]},{"name":"University of the Chinese Academy of Sciences","place":["Beijing, China"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-9237-559X","authenticated-orcid":false,"given":"Wenlong","family":"Zhu","sequence":"additional","affiliation":[{"name":"Institute of Computing Technology Chinese Academy of Sciences","place":["Beijing, China"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-3273-2271","authenticated-orcid":false,"given":"Kun","family":"Wang","sequence":"additional","affiliation":[{"name":"CICS, Institute of Computing Technology Chinese Academy of Sciences","place":["Beijing, China"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-2996-4478","authenticated-orcid":false,"given":"Xinyang","family":"He","sequence":"additional","affiliation":[{"name":"Chengdu Institute of Computer Applications, University of the Chinese Academy of Sciences","place":["Chengdu, China"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-8242-5895","authenticated-orcid":false,"given":"Nan","family":"Yang","sequence":"additional","affiliation":[{"name":"Institute of Computing Technology Chinese Academy of Sciences","place":["Beijing, China"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-7039-7707","authenticated-orcid":false,"given":"Zhirong","family":"Chen","sequence":"additional","affiliation":[{"name":"CICS, Institute of Computing Technology Chinese Academy of Sciences","place":["Beijing, China"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-9107-8764","authenticated-orcid":false,"given":"Dantong","family":"Jin","sequence":"additional","affiliation":[{"name":"Zhejiang Lab","place":["Hangzhou, China"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7193-464X","authenticated-orcid":false,"given":"Cangyuan","family":"Li","sequence":"additional","affiliation":[{"name":"CICS, ICT CAS","place":["Beijing, China"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5158-7417","authenticated-orcid":false,"given":"Yunhao","family":"Zhou","sequence":"additional","affiliation":[{"name":"Shanghai Innovation Center for Processor Technologies","place":["Shanghai, China"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-6432-4385","authenticated-orcid":false,"given":"Hao","family":"Yan","sequence":"additional","affiliation":[{"name":"Shanghai University","place":["Shanghai, China"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6252-2160","authenticated-orcid":false,"given":"Zhuoliang","family":"Zhao","sequence":"additional","affiliation":[{"name":"Shanghai Innovation Center for Processor Technologies","place":["Shanghai, China"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-7591-0418","authenticated-orcid":false,"given":"Yuan","family":"Cheng","sequence":"additional","affiliation":[{"name":"Nanjing University","place":["Nanjing, China"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7012-2308","authenticated-orcid":false,"given":"Mengdi","family":"Wang","sequence":"additional","affiliation":[{"name":"Institute of Computing Technology Chinese Academy of Sciences","place":["Beijing, China"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8407-2594","authenticated-orcid":false,"given":"Shengwen","family":"Liang","sequence":"additional","affiliation":[{"name":"SKLP, Institute of Computing Technology Chinese Academy of Sciences","place":["Beijing, China"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5113-8067","authenticated-orcid":false,"given":"Yinhe","family":"Han","sequence":"additional","affiliation":[{"name":"SKLP, Chinese Academy of Sciences, Institute of Computing Technology","place":["Beijing, China"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0874-814X","authenticated-orcid":false,"given":"Xiaowei","family":"Li","sequence":"additional","affiliation":[{"name":"SKLP, Chinese Academy of Sciences, Institute of Computing Technology","place":["Beijing, China"]},{"name":"University of the Chinese Academy of Sciences","place":["Beijing, China"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8082-4218","authenticated-orcid":false,"given":"Huawei","family":"Li","sequence":"additional","affiliation":[{"name":"SKLP, Institute of Computing Technology, Chinese Academy of Sciences","place":["Beijing, China"]},{"name":"University of the Chinese Academy of Sciences","place":["Beijing, China"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5172-4736","authenticated-orcid":false,"given":"Ying","family":"Wang","sequence":"additional","affiliation":[{"name":"Institute of Computing Technology, Chinese Academy of Sciences","place":["Beijing, China"]}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2025,10,17]]},"reference":[{"key":"e_1_3_3_2_2","unstructured":"Baleegh Ahmad Shailja Thakur Benjamin Tan Ramesh Karri and Hammond Pearce. 2023. Fixing hardware security bugs with large language models. arXiv:2302.01215. Retrieved from https:\/\/arxiv.org\/abs\/2302.01215"},{"key":"e_1_3_3_3_2","doi-asserted-by":"publisher","DOI":"10.1145\/1553374.1553380"},{"key":"e_1_3_3_4_2","doi-asserted-by":"publisher","unstructured":"J. Blocklove S. Garg R. Karri and H. Pearce. 2023. Chip-Chat: Challenges and opportunities in conversational hardware design. 2023 ACM\/IEEE 5th Workshop on Machine Learning for CAD (MLCAD). Snowbird UT USA 1\u20136. DOI:10.1109\/MLCAD58807.2023.10299874","DOI":"10.1109\/MLCAD58807.2023.10299874"},{"key":"e_1_3_3_5_2","doi-asserted-by":"publisher","unstructured":"Jason Blocklove Shailja Thakur Benjamin Tan Hammond Pearce Siddharth Garg and Ramesh Karri. 2025. Automatically improving LLM-based verilog generation using EDA tool feedback. ACM Trans. Des. Autom. Electron. Syst. Just Accepted (March 2025). 10.1145\/3723876","DOI":"10.1145\/3723876"},{"key":"e_1_3_3_6_2","unstructured":"Kaiyan Chang Ying Wang Haimeng Ren Mengdi Wang Shengwen Liang Yinhe Han Huawei Li and Xiaowei Li. 2023. ChipGPT: How far are we from natural language hardware design. arXiv:2305.14019. Retrieved from https:\/\/arxiv.org\/abs\/2305.14019"},{"key":"e_1_3_3_7_2","unstructured":"Matthew DeLorenzo Animesh Basak Chowdhury Vasudev Gohil Shailja Thakur Ramesh Karri Siddharth Garg and Jeyavijayan Rajendran. 2024. Make every move count: LLM-based high-quality RTL code generation using MCTS. arXiv:2402.03289. Retrieved from https:\/\/arxiv.org\/abs\/2402.03289"},{"key":"e_1_3_3_8_2","unstructured":"Jesse Dodge Gabriel Ilharco Roy Schwartz Ali Farhadi Hannaneh Hajishirzi and Noah Smith. 2020. Fine-tuning pretrained language models: Weight initializations data orders and early stopping. arXiv:2002.06305. Retrieved from https:\/\/arxiv.org\/abs\/2002.06305"},{"key":"e_1_3_3_9_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCAD57390.2023.10323953"},{"key":"e_1_3_3_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/ITT56123.2022.9863935"},{"key":"e_1_3_3_11_2","doi-asserted-by":"publisher","unstructured":"H. Wu et\u00a0al. 2024. ChatEDA: A large language model powered autonomous agent for EDA. In IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 43 10 (2024) 3184\u20133197. DOI:10.1109\/TCAD.2024.3383347","DOI":"10.1109\/TCAD.2024.3383347"},{"key":"e_1_3_3_12_2","unstructured":"Danny Hernandez Jared Kaplan Tom Henighan and Sam McCandlish. 2021. Scaling laws for transfer. arXiv:2102.01293. Retrieved from https:\/\/arxiv.org\/abs\/2102.01293"},{"key":"e_1_3_3_13_2","volume-title":"Proceedings of the International Conference on Learning Representations (ICLR\u201922)","author":"Hu Edward J.","year":"2022","unstructured":"Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2022. LoRA: Low-rank adaptation of large language models. In Proceedings of the International Conference on Learning Representations (ICLR\u201922). Retrieved from https:\/\/openreview.net\/forum?id=nZeVKeeFYf9"},{"key":"e_1_3_3_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA48891.2023.10160326"},{"key":"e_1_3_3_15_2","doi-asserted-by":"publisher","unstructured":"R. Kande et\u00a0al. 2024. (Security) assertions by large language models. In IEEE Transactions on Information Forensics and Security 19 (2024) 4374\u20134389. DOI:10.1109\/TIFS.2024.3372809","DOI":"10.1109\/TIFS.2024.3372809"},{"key":"e_1_3_3_16_2","unstructured":"Zhiding Liang Jinglei Cheng Rui Yang Hang Ren Zhixin Song Di Wu Xuehai Qian Tongyang Li and Yiyu Shi. 2023. Unleashing the potential of LLMs for quantum computing: A study in quantum architecture design. arXiv:2307.08191. Retrieved from https:\/\/arxiv.org\/abs\/2307.08191"},{"key":"e_1_3_3_17_2","volume-title":"Proceedings of the 2023 IEEE\/ACM International Conference on Computer-Aided Design (ICCAD\u201923)","author":"Liu Mingjie","year":"2023","unstructured":"Mingjie Liu, Nathaniel Pinckney, Brucek Khailany, and Haoxing Ren. 2023. VerilogEval: Evaluating large language models for verilog code generation. In Proceedings of the 2023 IEEE\/ACM International Conference on Computer-Aided Design (ICCAD\u201923)."},{"key":"e_1_3_3_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2024.3483089"},{"key":"e_1_3_3_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/LAD62341.2024.10691788"},{"key":"e_1_3_3_20_2","volume-title":"Proceedings of the Asia and South Pacific Design Automation Conference (ASP-DAC\u201923)","author":"Lu Yao","year":"2023","unstructured":"Yao Lu, Shang Liu, Qijun Zhang, and Zhiyao Xie. 2023. RTLLM: An open-source benchmark for design RTL generation with large language model. In Proceedings of the Asia and South Pacific Design Automation Conference (ASP-DAC\u201923)."},{"key":"e_1_3_3_21_2","unstructured":"Teo Ene Mingjie Liu\u00a7. 2023. ChipNeMo: Domain-adapted LLMs for chip design. arXiv:2307.09288. Retrieved from https:\/\/arxiv.org\/abs\/2307.09288"},{"key":"e_1_3_3_22_2","doi-asserted-by":"publisher","DOI":"10.1145\/3489517.3530673"},{"key":"e_1_3_3_23_2","unstructured":"Marcelo Orenes-Vera Margaret Martonosi and David Wentzlaff. 2023. From RTL to SVA: LLM-assisted generation of formal verification testbenches. arxiv:2309.09437 [cs.AR]. Retrieved from https:\/\/arxiv.org\/abs\/2309.09437"},{"key":"e_1_3_3_24_2","doi-asserted-by":"publisher","DOI":"10.1145\/3380446.3430634"},{"key":"e_1_3_3_25_2","unstructured":"Zehua Pei Hui-Ling Zhen Mingxuan Yuan Yu Huang and Bei Yu. 2024. BetterV: controlled verilog generation with discriminative guidance. In Proceedings of the 41st International Conference on Machine Learning (ICML\u201924) Vol. 235. JMLR.org 40145\u201340153."},{"key":"e_1_3_3_26_2","article-title":"Direct preference optimization: Your language model is secretly a reward model","author":"Rafailov Rafael","year":"2024","unstructured":"Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D. Manning, Stefano Ermon, and Chelsea Finn. 2024. Direct preference optimization: Your language model is secretly a reward model. In Proceedings of the 37th International Conference on Neural Information Processing Systems .","journal-title":"Proceedings of the 37th International Conference on Neural Information Processing Systems"},{"key":"e_1_3_3_27_2","first-page":"1","article-title":"Mathematical discoveries from program search with large language models","author":"Romera-Paredes Bernardino","year":"2023","unstructured":"Bernardino Romera-Paredes, Mohammadamin Barekatain, Alexander Novikov, Matej Balog, M. Pawan Kumar, Emilien Dupont, Francisco J. R. Ruiz, Jordan S. Ellenberg, Pengming Wang, Omar Fawzi, et\u00a0al. 2023. Mathematical discoveries from program search with large language models. Nature 625, 7995 (2023), 1\u20133.","journal-title":"Nature"},{"key":"e_1_3_3_28_2","doi-asserted-by":"publisher","DOI":"10.23919\/DATE56975.2023.10137086"},{"key":"e_1_3_3_29_2","doi-asserted-by":"publisher","unstructured":"Shailja Thakur Baleegh Ahmad Hammond Pearce Benjamin Tan Brendan Dolan-Gavitt Ramesh Karri and Siddharth Garg. 2024. VeriGen: A large language model for verilog code generation. ACM Trans. Des. Autom. Electron. Syst. 29 3 Article 46 (May 2024) 31 pages. 10.1145\/3643681","DOI":"10.1145\/3643681"},{"key":"e_1_3_3_30_2","unstructured":"Hugo Touvron Louis Martin Kevin R. Stone Peter Albert Amjad Almahairi Yasmine Babaei Nikolay Bashlykov Soumya Batra Prajjwal Bhargava Shruti Bhosale et al.. 2023. Llama 2: Open foundation and fine-tuned chat models. arXiv:2307.09288. Retrieved from https:\/\/arxiv.org\/abs\/2307.09288https:\/\/api.semanticscholar.org\/CorpusID:259950998"},{"key":"e_1_3_3_31_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41586-023-06747-5"},{"key":"e_1_3_3_32_2","doi-asserted-by":"publisher","DOI":"10.1145\/3649329.3657353"},{"issue":"9","key":"e_1_3_3_33_2","first-page":"4555","article-title":"A survey on curriculum learning","volume":"44","author":"Wang Xin","year":"2021","unstructured":"Xin Wang, Yudong Chen, and Wenwu Zhu. 2021. A survey on curriculum learning. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 9 (2021), 4555\u20134576.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"e_1_3_3_34_2","doi-asserted-by":"publisher","DOI":"10.1145\/3649329.3658493"},{"key":"e_1_3_3_35_2","doi-asserted-by":"publisher","unstructured":"Y. Wei et\u00a0al. 2024. Editable scene simulation for autonomous driving via collaborative LLM-Agents. IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Seattle WA USA 15077\u201315087. DOI:10.1109\/CVPR52733.2024.01428","DOI":"10.1109\/CVPR52733.2024.01428"},{"key":"e_1_3_3_36_2","doi-asserted-by":"crossref","unstructured":"Zheyu Yan Yifan Qin Xiaobo Sharon Hu and Yiyu Shi. 2023. On the viability of using LLMs for SW\/HW co-design: An example in designing CiM DNN accelerators. arXiv:2306.06923. Retrieved from https:\/\/arxiv.org\/abs\/2306.06923","DOI":"10.1109\/SOCC58585.2023.10256783"},{"key":"e_1_3_3_37_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jss.2022.111304"},{"key":"e_1_3_3_38_2","doi-asserted-by":"publisher","DOI":"10.1109\/LAD62341.2024.10691738"},{"key":"e_1_3_3_39_2","unstructured":"Yang Zhao Di Huang Chongxiao Li Pengwei Jin Ziyuan Nan Tianyun Ma Lei Qi Yansong Pan Zhenxing Zhang Rui Zhang et\u00a0al. 2024. CodeV: Empowering LLMs for Verilog generation through multi-level summarization. arxiv:2407.10424 [cs.PL]. Retrieved from https:\/\/arxiv.org\/abs\/2407.10424"},{"key":"e_1_3_3_40_2","first-page":"1","article-title":"Least-to-most prompting enables complex reasoning in large language models","author":"Zhou Denny","year":"2023","unstructured":"Denny Zhou, Nathanael Sch\u00e4rli, Le Hou, Jason Wei, Nathan Scales, Xuezhi Wang, Dale Schuurmans, Claire Cui, Olivier Bousquet, Quoc V. Le, et al. 2023. Least-to-most prompting enables complex reasoning in large language models. In Proceedings of the International Conference on Learning Representations . 1\u201361. Retrieved from https:\/\/openreview.net\/forum?id=WZH7099tgfM","journal-title":"Proceedings of the International Conference on Learning Representations"}],"container-title":["ACM Transactions on Design Automation of Electronic Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3727980","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,16]],"date-time":"2026-04-16T16:55:14Z","timestamp":1776358514000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3727980"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,10,17]]},"references-count":39,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2025,11,30]]}},"alternative-id":["10.1145\/3727980"],"URL":"https:\/\/doi.org\/10.1145\/3727980","relation":{},"ISSN":["1084-4309","1557-7309"],"issn-type":[{"value":"1084-4309","type":"print"},{"value":"1557-7309","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,10,17]]},"assertion":[{"value":"2024-07-31","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-03-31","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-10-17","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}