{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,4]],"date-time":"2026-06-04T02:52:30Z","timestamp":1780541550025,"version":"3.54.1"},"reference-count":44,"publisher":"Oxford University Press (OUP)","issue":"1","license":[{"start":{"date-parts":[[2025,11,22]],"date-time":"2025-11-22T00:00:00Z","timestamp":1763769600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100012166","name":"National Key R&D Program of China","doi-asserted-by":"publisher","award":["2024YFC3017000"],"award-info":[{"award-number":["2024YFC3017000"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2026,1,2]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>The explicit integration for nonlinear structural dynamics in finite element analysis (FEA) is inherently decoupled in its algebraic equations, making it well-suited for parallel computation. This paper presents a novel and efficient central processing unit (CPU)\/graphics processing unit (GPU) implementation and optimization strategy for the explicit integration of complex tall buildings subjected to seismic loading for the design software YJK. The presence of multiple element types and distinct material constitutive laws in finite element (FE) models of reinforced concrete building structures results in significant computational overhead and branching. In this paper, the calculation-related data for a FE model is reorganized into several data-domains, each corresponding to sole element type and sole material constitutive law. To achieve higher computational performance, a concurrent kernel execution strategy is implemented on the GPU platform. Instead of relying on the default, inefficient kernel scheduler of GPU, we developed an efficient scheduler to maximize GPU utilization. This scheduler first measures resource requirements of each kernel, then ranks and divides them into sub-kernels for concurrent execution. Performance tests on practical engineering project demonstrate that, without compromising accuracy, the proposed optimization strategy achieves up to 328.66\u00a0\u00d7\u00a0performance improvement over CPU serial implementation, and up to 4.76\u00a0\u00d7\u00a0and 1.59\u00a0\u00d7\u00a0improvements over a simpler GPU implementation and the default GPU scheduler, respectively.<\/jats:p>","DOI":"10.1093\/jcde\/qwaf127","type":"journal-article","created":{"date-parts":[[2025,11,21]],"date-time":"2025-11-21T13:15:30Z","timestamp":1763730930000},"page":"141-157","source":"Crossref","is-referenced-by-count":1,"title":["A GPU optimization strategy in nonlinear explicit dynamic analysis for reinforced concrete buildings with composite elements"],"prefix":"10.1093","volume":"13","author":[{"ORCID":"https:\/\/orcid.org\/0009-0005-2458-4930","authenticated-orcid":false,"given":"Lanqi","family":"Liu","sequence":"first","affiliation":[{"name":"School of Mechanics and Engineering Science, College of Engineering, Peking University , Beijing 100871 ,","place":["China"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Yongqiang","family":"Chen","sequence":"additional","affiliation":[{"name":"School of Mechanics and Engineering Science, College of Engineering, Peking University , Beijing 100871 ,","place":["China"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Xianlei","family":"Wang","sequence":"additional","affiliation":[{"name":"YJK Building Software Limited , Beijing 100013 ,","place":["China"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Zhongliang","family":"Su","sequence":"additional","affiliation":[{"name":"YJK Building Software Limited , Beijing 100013 ,","place":["China"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5115-119X","authenticated-orcid":false,"given":"Pu","family":"Chen","sequence":"additional","affiliation":[{"name":"School of Mechanics and Engineering Science, College of Engineering, Peking University , Beijing 100871 ,","place":["China"]},{"name":"State Key Laboratory for Turbulence & Complex Systems, Peking University , Beijing 100871 ,","place":["China"]}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"286","published-online":{"date-parts":[[2025,11,22]]},"reference":[{"key":"2026010206312984300_bib1","doi-asserted-by":"publisher","first-page":"46","DOI":"10.1016\/j.compstruc.2016.05.002","article-title":"GPU computing for accelerating the numerical Path Integration approach","volume":"171","author":"Alevras","year":"2016","journal-title":"Computers & Structures"},{"key":"2026010206312984300_bib2","doi-asserted-by":"publisher","first-page":"104","DOI":"10.1109\/RTSS.2017.00017","article-title":"GPU Scheduling on the NVIDIA TX2: Hidden Details Revealed","author":"Amert","year":"2017","journal-title":"2017 IEEE Real-Time Systems Symposium (RTSS)"},{"key":"2026010206312984300_bib3","doi-asserted-by":"publisher","first-page":"243","DOI":"10.1086\/344723","article-title":"Nonoscillatory Central Difference and Artificial Viscosity Schemes for Relativistic Hydrodynamics","volume":"144","author":"Anninos","year":"2003","journal-title":"The Astrophysical Journal Supplement Series"},{"key":"2026010206312984300_bib4","doi-asserted-by":"publisher","first-page":"29","DOI":"10.1016\/j.compstruc.2015.03.005","article-title":"An explicit dynamics GPU structural solver for thin shell finite elements","volume":"154","author":"Bartezzaghi","year":"2015","journal-title":"Computers & Structures"},{"key":"2026010206312984300_bib5","volume-title":"Nonlinear Finite Elements for Continua and Structures","author":"Belytschko","year":"2014"},{"key":"2026010206312984300_bib6","doi-asserted-by":"publisher","first-page":"370","DOI":"10.1016\/j.advengsoft.2011.10.014","article-title":"Development of parallel explicit finite element sheet forming simulation system based on GPU architecture","volume":"45","author":"Cai","year":"2012","journal-title":"Advances in Engineering Software"},{"key":"2026010206312984300_bib7","doi-asserted-by":"publisher","first-page":"640","DOI":"10.1002\/nme.2989","article-title":"Assembly of finite element methods on graphics processors","volume":"85","author":"Cecka","year":"2011","journal-title":"International Journal for Numerical Methods in Engineering"},{"key":"2026010206312984300_bib8","volume-title":"Professional CUDA C programming","author":"Cheng","year":"2014"},{"key":"2026010206312984300_bib9","doi-asserted-by":"publisher","first-page":"3127","DOI":"10.4028\/www.scientific.net\/AMM.580-583.3127","article-title":"An Integrated Simulation System for Building Structures","volume":"580\u2013583","author":"Duan","year":"2014","journal-title":"Applied Mechanics and Materials"},{"key":"2026010206312984300_bib10","doi-asserted-by":"publisher","first-page":"111","DOI":"10.1007\/s11831-013-9082-8","article-title":"GPU Acceleration for FEM-Based Structural Analysis","volume":"20","author":"Georgescu","year":"2013","journal-title":"Archives of Computational Methods in Engineering"},{"key":"2026010206312984300_bib11","doi-asserted-by":"publisher","first-page":"81","DOI":"10.1145\/3453953.3453972","article-title":"Demystifying the Placement Policies of the NVIDIA GPU Thread Block Scheduler for Concurrent Kernels","volume":"48","author":"Gilman","year":"2021","journal-title":"ACM SIGMETRICS Performance Evaluation Review"},{"key":"2026010206312984300_bib12","doi-asserted-by":"publisher","first-page":"32","DOI":"10.1145\/3529113.3529124","article-title":"Characterizing Concurrency Mechanisms for NVIDIA GPUs under Deep Learning Workloads","volume":"49","author":"Gilman","year":"2022","journal-title":"ACM SIGMETRICS Performance Evaluation Review"},{"key":"2026010206312984300_bib13","volume-title":"The Finite Element Method: Linear Static and Dynamic Finite Element Analysis","author":"Hughes","year":"2012"},{"key":"2026010206312984300_bib14","doi-asserted-by":"publisher","first-page":"200","DOI":"10.1016\/j.advengsoft.2018.02.008","article-title":"CUDA accelerated implementation of parallel dynamic relaxation","volume":"125","author":"Iv\u00e1nyi","year":"2018","journal-title":"Advances in Engineering Software"},{"key":"2026010206312984300_bib15","doi-asserted-by":"publisher","first-page":"705","DOI":"10.1016\/j.jcde.2018.11.001","article-title":"GPU-warp based finite element matrices generation and assembly using coloring method","volume":"6","author":"Kiran","year":"2019","journal-title":"Journal of Computational Design and Engineering"},{"key":"2026010206312984300_bib16","doi-asserted-by":"publisher","first-page":"451","DOI":"10.1016\/j.jpdc.2009.01.006","article-title":"Porting a high-order finite-element earthquake modeling application to NVIDIA graphics cards using CUDA","volume":"69","author":"Komatitsch","year":"2009","journal-title":"Journal of Parallel and Distributed Computing"},{"key":"2026010206312984300_bib17","doi-asserted-by":"publisher","first-page":"81","DOI":"10.1007\/s40430-019-2160-6","article-title":"Numerical investigation of supersonic transverse jet interaction on CPU\/GPU system","volume":"42","author":"Lai","year":"2020","journal-title":"Journal of the Brazilian Society of Mechanical Sciences and Engineering"},{"key":"2026010206312984300_bib18","doi-asserted-by":"publisher","first-page":"3055","DOI":"10.3390\/buildings13123055","article-title":"Analysis and Application of Double Steel Plate Concrete Composite Shear Wall in the R&D Building of Zhanjiang Bay Laboratory","volume":"13","author":"Lan","year":"2023","journal-title":"Buildings"},{"key":"2026010206312984300_bib19","doi-asserted-by":"publisher","first-page":"301","DOI":"10.1109\/BigData.2014.7004245","article-title":"Performance modeling in CUDA streams\u2014A means for high-throughput data processing","author":"Li","year":"2014","journal-title":"2014 IEEE International Conference on Big Data (Big Data)"},{"key":"2026010206312984300_bib20","doi-asserted-by":"publisher","first-page":"273","DOI":"10.6180\/jase.2016.19.3.05","article-title":"The Building Information Modeling and its Use for Data Transformation in the Structural Design Stage","volume":"19","author":"Liu","year":"2016","journal-title":"Journal of Applied Science and Engineering"},{"key":"2026010206312984300_bib21","doi-asserted-by":"publisher","DOI":"10.1007\/978-981-15-9532-5","volume-title":"Earthquake Disaster Simulation of Civil Infrastructures: From Tall Buildings to Urban Areas","author":"Lu","year":"2021","edition":"Second Edition"},{"key":"2026010206312984300_bib22","doi-asserted-by":"publisher","first-page":"2195","DOI":"10.1007\/s00170-016-8542-3","article-title":"An accelerated explicit method with GPU parallel computing for thermal stress and welding deformation of large structure models","volume":"87","author":"Ma","year":"2016","journal-title":"The International Journal of Advanced Manufacturing Technology"},{"key":"2026010206312984300_bib23","doi-asserted-by":"publisher","first-page":"354","DOI":"10.1145\/3123939.3124538","article-title":"Constructing and characterizing covert channels on GPGPUs","author":"Naghibijouybari","year":"2017","journal-title":"Proceedings of the 50th Annual IEEE\/ACM International Symposium on Microarchitecture"},{"key":"2026010206312984300_bib24","volume-title":"Fermi Architecture","author":"NVIDIA","year":"2009"},{"key":"2026010206312984300_bib25","volume-title":"CUDA","author":"NVIDIA","year":"2025"},{"key":"2026010206312984300_bib26","volume-title":"CUPTI","author":"NVIDIA","year":"2025"},{"key":"2026010206312984300_bib27","doi-asserted-by":"publisher","first-page":"140","DOI":"10.1007\/978-3-642-38718-0_16","article-title":"Implementation and Evaluation of 3D Finite Element Method Application for CUDA","volume":"7851","author":"Ohshima","year":"2013","journal-title":"International Conference on High Performance Computing for Computational Science"},{"key":"2026010206312984300_bib28","volume-title":"PEER Ground Motion Database","author":"PEER Center","year":"2025"},{"key":"2026010206312984300_bib29","doi-asserted-by":"publisher","first-page":"21","DOI":"10.1007\/978-981-19-8657-4","article-title":"Aseismic Design of an Out-of-Code High-Rise Building in Shanghai","volume-title":"Advances in Frontier Research on Engineering Structures","author":"Ren","year":"2023"},{"key":"2026010206312984300_bib30","volume-title":"NVIDIA","author":"Rennich","year":"2011"},{"key":"2026010206312984300_bib31","doi-asserted-by":"publisher","first-page":"766","DOI":"10.1109\/TPDS.2019.2944602","article-title":"cCUDA: Effective Co-Scheduling of Concurrent Kernels on GPUs","volume":"31","author":"Shekofteh","year":"2020","journal-title":"IEEE Transactions on Parallel and Distributed Systems"},{"key":"2026010206312984300_bib32","volume-title":"Basic Principles of Concrete Structures","author":"Shi","year":"2024"},{"key":"2026010206312984300_bib33","doi-asserted-by":"publisher","first-page":"04022253","DOI":"10.1061\/JSENDH.STENG-11311","article-title":"Challenges in GPU-Accelerated Nonlinear Dynamic Analysis for Structural Systems","volume":"149","author":"Simpson","year":"2023","journal-title":"Journal of Structural Engineering"},{"key":"2026010206312984300_bib34","volume-title":"ABAQUS","author":"SIMULIA","year":"2020"},{"key":"2026010206312984300_bib35","doi-asserted-by":"publisher","first-page":"711","DOI":"10.1002\/(SICI)1096-9845(199607)25:7&lt;711::AID-EQE576&gt;3.0.CO;2-9","article-title":"Fibre Beam-Column Model for Non-Linear Analysis of R\/C Frames: Part I. Formulation","volume":"25","author":"Spacone","year":"1996","journal-title":"Earthquake Engineering & Structural Dynamics"},{"key":"2026010206312984300_bib36","doi-asserted-by":"publisher","first-page":"5","DOI":"10.32604\/cmes.2020.08104","article-title":"Parallelized Implementation of the Finite Particle Method for Explicit Dynamics in GPU","volume":"122","author":"Tang","year":"2020","journal-title":"Computer Modeling in Engineering & Sciences"},{"key":"2026010206312984300_bib37","doi-asserted-by":"publisher","first-page":"211","DOI":"10.1093\/jcde\/qwaa018","article-title":"High-performance practical stiffness analysis of high-rise buildings using superfloor elements","volume":"7","author":"Torky","year":"2020","journal-title":"Journal of Computational Design and Engineering"},{"key":"2026010206312984300_bib38","doi-asserted-by":"publisher","first-page":"107516","DOI":"10.1016\/j.compstruc.2024.107516","article-title":"A GPU-Accelerated automated multilevel substructuring method for modal analysis of structures","volume":"305","author":"Wang","year":"2024","journal-title":"Computers & Structures"},{"key":"2026010206312984300_bib39","doi-asserted-by":"publisher","first-page":"2193","DOI":"10.1108\/EC-07-2019-0328","article-title":"A novel parallel finite element procedure for nonlinear dynamic problems using GPU and mixed-precision algorithm","volume":"37","author":"Wang","year":"2020","journal-title":"Engineering Computations"},{"key":"2026010206312984300_bib40","doi-asserted-by":"publisher","DOI":"10.1002\/9781118382011","volume-title":"Introduction to the Explicit Finite Element Method for Nonlinear Transient Dynamics","author":"Wu","year":"2012"},{"key":"2026010206312984300_bib41","volume-title":"YJK Building Software","author":"YJK","year":"2025"},{"key":"2026010206312984300_bib42","doi-asserted-by":"publisher","first-page":"359","DOI":"10.1093\/jcde\/qwae053","article-title":"AI-powered fire engineering design and smoke flow analysis for complex-shaped buildings","volume":"11","author":"Zeng","year":"2024","journal-title":"Journal of Computational Design and Engineering"},{"key":"2026010206312984300_bib43","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/j.future.2022.09.005","article-title":"Hybrid MPI and CUDA paralleled finite volume unstructured CFD simulations on a multi-GPU system","volume":"139","author":"Zhang","year":"2023","journal-title":"Future Generation Computer Systems"},{"key":"2026010206312984300_bib44","doi-asserted-by":"publisher","first-page":"1451","DOI":"10.1109\/TPDS.2021.3115630","article-title":"A Survey of GPU Multitasking Methods Supported by Hardware Architecture","volume":"33","author":"Zhao","year":"2022","journal-title":"IEEE Transactions on Parallel and Distributed Systems"}],"container-title":["Journal of Computational Design and Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/jcde\/advance-article-pdf\/doi\/10.1093\/jcde\/qwaf127\/65477664\/qwaf127.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/jcde\/article-pdf\/13\/1\/141\/65477664\/qwaf127.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/jcde\/article-pdf\/13\/1\/141\/65477664\/qwaf127.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,1,2]],"date-time":"2026-01-02T11:31:40Z","timestamp":1767353500000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/jcde\/article\/13\/1\/141\/8340365"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,11,22]]},"references-count":44,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2026,1,2]]}},"URL":"https:\/\/doi.org\/10.1093\/jcde\/qwaf127","relation":{},"ISSN":["2288-5048"],"issn-type":[{"value":"2288-5048","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2026,1]]},"published":{"date-parts":[[2025,11,22]]}}}