{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,29]],"date-time":"2026-03-29T16:11:49Z","timestamp":1774800709521,"version":"3.50.1"},"reference-count":36,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2011,6,1]],"date-time":"2011-06-01T00:00:00Z","timestamp":1306886400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Des. Autom. Electron. Syst."],"published-print":{"date-parts":[[2011,6]]},"abstract":"<jats:p>In this article, we developed a massively parallel gate-level logical simulator to address the ever-increasing computing demand for VLSI verification. To the best of the authors\u2019 knowledge, this work is the first one to leverage the power of modern GPUs to successfully unleash the massive parallelism of a conservative discrete event-driven algorithm, CMB algorithm. A novel data-parallel strategy is proposed to manipulate the fine-grain message passing mechanism required by the CMB protocol. To support robust and complete simulation for real VLSI designs, we establish both a memory paging mechanism and an adaptive issuing strategy to efficiently utilize the GPU memory with a limited capacity. A set of GPU architecture-specific optimizations are performed to further enhance the overall simulation performance. On average, our simulator outperforms a CPU baseline event-driven simulator by a factor of 47.4X. This work proves that the CMB algorithm can be efficiently and effectively deployed on modern GPUs without the performance overhead that had hindered its successful applications on previous parallel architectures.<\/jats:p>","DOI":"10.1145\/1970353.1970362","type":"journal-article","created":{"date-parts":[[2011,6,14]],"date-time":"2011-06-14T14:44:54Z","timestamp":1308062694000},"page":"1-20","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":27,"title":["Massively Parallel Logic Simulation with GPUs"],"prefix":"10.1145","volume":"16","author":[{"given":"Yuhao","family":"Zhu","sequence":"first","affiliation":[{"name":"Beihang University"}]},{"given":"Bo","family":"Wang","sequence":"additional","affiliation":[{"name":"Tsinghua University"}]},{"given":"Yangdong","family":"Deng","sequence":"additional","affiliation":[{"name":"Tsinghua University"}]}],"member":"320","published-online":{"date-parts":[[2011,6]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/1465482.1465560"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/185403.185424"},{"key":"e_1_2_1_3_1","volume-title":"Proceedings of the IEEE\/ACM International Conference on Computer-Aided Design (ICCAD\u201992)","author":"Bataineh A."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2008.917718"},{"key":"e_1_2_1_5_1","unstructured":"Bryant R. E. 1977. Simulation of packet communications architecture computer system. Tech. rep. MIT-LCS-TR-188 MIT. Bryant R. E. 1977. Simulation of packet communications architecture computer system. Tech. rep. MIT-LCS-TR-188 MIT."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.1979.230182"},{"key":"e_1_2_1_7_1","first-page":"105","article-title":"Distributed simulation of networks","volume":"3","author":"Chandy K. M.","year":"1979","journal-title":"Comput. Netw."},{"key":"e_1_2_1_8_1","volume-title":"Proceedings of the Conference on Design, Automation, and Test in Europe (DATE\u201909)","author":"Chatterjee D."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/1629911.1630056"},{"key":"e_1_2_1_10_1","unstructured":"CUDA 2.3. 2011. NVidia CUDA programming guide 2.3. http:\/\/developer.download.nvidia.com\/compute\/cuda\/2_3\/toolkit\/docs\/NVIDIA_CUDA_Programming_Guide_2.3.pdf. CUDA 2.3 . 2011. NVidia CUDA programming guide 2.3. http:\/\/developer.download.nvidia.com\/compute\/cuda\/2_3\/toolkit\/docs\/NVIDIA_CUDA_Programming_Guide_2.3.pdf."},{"key":"e_1_2_1_11_1","unstructured":"Fujimoto R. M. 2000. Parallel and Distributed Simulation Systems. Wiley-Interscience. Fujimoto R. M. 2000. Parallel and Distributed Simulation Systems. Wiley-Interscience."},{"key":"e_1_2_1_12_1","unstructured":"GTX 280. 2011. GeForce GTX280. http:\/\/www.nvidia.com\/object\/geforcefamily.html. GTX 280 . 2011. GeForce GTX280. http:\/\/www.nvidia.com\/object\/geforcefamily.html."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/1391469.1391679"},{"key":"e_1_2_1_14_1","unstructured":"Holmes V. 1978. Parallel algorithms on multiple processor architectures. Doctoral dissertation University of Texas at Austin Austin TX. Holmes V. 1978. Parallel algorithms on multiple processor architectures. Doctoral dissertation University of Texas at Austin Austin TX."},{"key":"e_1_2_1_15_1","unstructured":"IEEE System C. 2011. IEEE Std. 1666-2005 Standard for SystemC. http:\/\/ieeexplore.ieee.org\/xpl\/mostRecentIssue.jsp?punumber=10761. IEEE System C . 2011. IEEE Std. 1666-2005 Standard for SystemC. http:\/\/ieeexplore.ieee.org\/xpl\/mostRecentIssue.jsp?punumber=10761."},{"key":"e_1_2_1_16_1","unstructured":"ITC99. 2011. ITC99 benchmarks. http:\/\/www.cad.polito.it\/tools\/itc99.html. ITC99. 2011. ITC99 benchmarks. http:\/\/www.cad.polito.it\/tools\/itc99.html."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3916.3988"},{"key":"e_1_2_1_18_1","first-page":"183","article-title":"Fast concurrent simulation using the time warp mechanism","volume":"19","author":"Jefferson D.","year":"1985","journal-title":"Distrib. Syst."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2008.31"},{"key":"e_1_2_1_20_1","volume-title":"Proceedings of the SCS Multiconference on Distributed Simulation. 183--191","author":"Lubachevsky B. D.","year":"1988"},{"key":"e_1_2_1_21_1","unstructured":"MPI. 2011. MPI. http:\/\/www.mpi-forum.org\/docs\/. MPI. 2011. MPI. http:\/\/www.mpi-forum.org\/docs\/."},{"key":"e_1_2_1_22_1","volume-title":"Proceedings of the 15th Asia and South Pacific Design Automation Conference (ASPDAC\u201910)","author":"Nanjundappa M."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/102810.102812"},{"key":"e_1_2_1_24_1","unstructured":"NVIDIA. 2009. NVidia white paper: NVIDIA\u2019s next generation cudatm compute architecture: Fermi. http:\/\/www.nvidia.com\/content\/PDF\/fermi_white_papers\/NVIDIA_Fermi_ Compute_Architecture_Whitepaper.pdf NVIDIA. 2009. NVidia white paper: NVIDIA\u2019s next generation cudatm compute architecture: Fermi. http:\/\/www.nvidia.com\/content\/PDF\/fermi_white_papers\/NVIDIA_Fermi_ Compute_Architecture_Whitepaper.pdf"},{"key":"e_1_2_1_25_1","unstructured":"OpenCores. 2011. OpenCores http:\/\/www.opencores.org\/. OpenCores. 2011. OpenCores http:\/\/www.opencores.org\/."},{"key":"e_1_2_1_26_1","volume-title":"Proceedings of the 40th Conference on Winter Simulation (WSC\u201908)","author":"Park H."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1177\/0037549709340781"},{"key":"e_1_2_1_28_1","first-page":"44","article-title":"Distributed simulation using a network of processors","volume":"3","author":"Peacock J. K.","year":"1979","journal-title":"Comput. Netw."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.5555\/1218112.1218132"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/PADS.2006.15"},{"key":"e_1_2_1_31_1","unstructured":"Rashinkar P. Paterson P. and Singh L. 2000. System-on-a-Chip Verification: Methodology and Techniques 1st Ed. Springer. Rashinkar P. Paterson P. and Singh L. 2000. System-on-a-Chip Verification: Methodology and Techniques 1st Ed. Springer."},{"key":"e_1_2_1_32_1","first-page":"112","article-title":"Cancellation strategies in optimistic execution systems","volume":"22","author":"Reiher P. L.","year":"1990","journal-title":"Proc. SCS Muitlconf. Distrib. Simul."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/SIMUL.2009.36"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/130611.130613"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2007.12"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/PADS.2007.20"}],"container-title":["ACM Transactions on Design Automation of Electronic Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1970353.1970362","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1970353.1970362","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T10:52:52Z","timestamp":1750243972000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1970353.1970362"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2011,6]]},"references-count":36,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2011,6]]}},"alternative-id":["10.1145\/1970353.1970362"],"URL":"https:\/\/doi.org\/10.1145\/1970353.1970362","relation":{},"ISSN":["1084-4309","1557-7309"],"issn-type":[{"value":"1084-4309","type":"print"},{"value":"1557-7309","type":"electronic"}],"subject":[],"published":{"date-parts":[[2011,6]]},"assertion":[{"value":"2010-01-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2011-03-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2011-06-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}