{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,20]],"date-time":"2026-06-20T16:21:13Z","timestamp":1781972473270,"version":"3.54.5"},"reference-count":23,"publisher":"World Scientific Pub Co Pte Ltd","issue":"15","funder":[{"name":"Ministry of Science and Technology, Taiwan, MOST","award":["107-2221-E-024-003"],"award-info":[{"award-number":["107-2221-E-024-003"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["J CIRCUIT SYST COMP"],"published-print":{"date-parts":[[2023,10]]},"abstract":"<jats:p> In deep learning, convolutional neural networks\u00a0(CNNs) are a class of artificial neural networks\u00a0(ANNs), most commonly applied to analyze visual imagery.\u00a0They are also known as\u00a0Shift-Invariant\u00a0or\u00a0Space-Invariant Artificial Neural Networks\u00a0(SIANNs), based on the shared-weight architecture of the convolution kernels or filters that slide along input features and provide translation-equivariant\u00a0responses known as feature maps. Recently, various architectures for CNN based on FPGA platform have been proposed because it has the advantages of high performance and fast development cycle. However, some key issues including how to optimize the performance of CNN layers with different structures, high-performance heterogeneous accelerator design, and how to reduce the neural network framework integration overhead need to be improved. To overcome and improve these problems, we propose dynamic cycle pipeline tiling, data layout optimization, and a pipelined software and hardware (SW\u2013HW)-integrated architecture with flexibility and integration. Some benchmarks have been tested and implemented on the FPGA board for the proposed architecture. The proposed dynamic tiling and data layout transformation improved by 2.3 times in the performance. Moreover, with two-level pipelining, we achieve up to five times speedup and the proposed system is 3.8 times more energy-efficient than the GPU. <\/jats:p>","DOI":"10.1142\/s0218126623502547","type":"journal-article","created":{"date-parts":[[2023,3,10]],"date-time":"2023-03-10T07:04:40Z","timestamp":1678431880000},"source":"Crossref","is-referenced-by-count":5,"title":["Optimizing FPGA-Based Convolutional Neural Network Performance"],"prefix":"10.1142","volume":"32","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3174-9367","authenticated-orcid":false,"given":"Chi-Chou","family":"Kao","sequence":"first","affiliation":[{"name":"Department of Computer Science and Information Engineering, National University of Tainan, Tainan City 700, Taiwan, R. O. C."}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"219","published-online":{"date-parts":[[2023,4,21]]},"reference":[{"key":"S0218126623502547BIB001","first-page":"32","volume-title":"Proc. Int. Conf. Field Programmable Logic and Applications","author":"Farabet C.","year":"2009"},{"key":"S0218126623502547BIB003","doi-asserted-by":"crossref","first-page":"221","DOI":"10.1109\/TPAMI.2012.59","volume":"35","author":"Ji S.","year":"2013","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"S0218126623502547BIB004","first-page":"1097","volume-title":"Advances in Neural Information Processing Systems","volume":"25","author":"Krizhevsky A.","year":"2012"},{"key":"S0218126623502547BIB006","first-page":"1","volume-title":"Proc. IEEE Conf. Computer Vision and Pattern Recognition","author":"Szegedy C.","year":"2015"},{"key":"S0218126623502547BIB009","first-page":"161","volume-title":"Proc. ACM\/SIGDA Int. Symp. Field-Programmable Gate Arrays","author":"Zhang C.","year":"2015"},{"key":"S0218126623502547BIB010","doi-asserted-by":"crossref","first-page":"127","DOI":"10.1109\/JSSC.2016.2616357","volume":"52","author":"Chen Y.-H.","year":"2017","journal-title":"IEEE J. Solid-State Circuits"},{"key":"S0218126623502547BIB011","volume-title":"Proc. IEEE\/ACM Int. Conf. Computer-Aided Design","author":"Zhang X.","year":"2018"},{"key":"S0218126623502547BIB012","first-page":"310","volume-title":"Proc. Int. Conf. Field-Programmable Technology (FPT)","author":"Akira J.","year":"2018"},{"key":"S0218126623502547BIB014","volume-title":"Proc. 8th USENIX Workshop on Hot Topics in Cloud Computing","author":"Chen Y.-T.","year":"2016"},{"key":"S0218126623502547BIB015","first-page":"53","volume-title":"Proc. 20th IEEE Int. Conf. Application-specific Systems, Architectures and Processors","author":"Sankaradas M.","year":"2009"},{"key":"S0218126623502547BIB016","first-page":"55","volume-title":"Proc. ACM\/SIGDA Int. Symp. Field-Programmable Gate Arrays","author":"Aydonat U.","year":"2017"},{"key":"S0218126623502547BIB017","first-page":"45","volume-title":"Proc. ACM\/SIGDA FPGA Int. Symp. Field-Programmable Gate Arrays","author":"Ma Y.","year":"2017"},{"key":"S0218126623502547BIB018","volume-title":"Proc. 54th ACM\/EDAC\/IEEE Design Automation Conf.","author":"Wei X.","year":"2017"},{"key":"S0218126623502547BIB019","first-page":"161","volume-title":"Proc. ACM\/SIGDA Int. Symp. Field-Programmable Gate Arrays","author":"Zhang C.","year":"2015"},{"key":"S0218126623502547BIB020","first-page":"25","volume-title":"Proc. ACM\/SIGDA Int. Symp. Field-Programmable Gate Arrays","author":"Zhang J.","year":"2017"},{"key":"S0218126623502547BIB021","volume-title":"Proc. 26th Int. Conf. Field Programmable Logic and Applications","author":"Li H.","year":"2016"},{"key":"S0218126623502547BIB022","first-page":"535","volume-title":"Proc. 44th Annu. Int. Symp. Computer Architecture","author":"Shen Y.","year":"2017"},{"key":"S0218126623502547BIB023","first-page":"65","volume-title":"Proc. ACM\/SIGDA FPGA Int. Symp. Field-Programmable Gate Arrays","author":"Umuroglu Y.","year":"2017"},{"key":"S0218126623502547BIB024","first-page":"326","volume-title":"Proc. Int. Symp. Low Power Electronics and Design","author":"Zhang C.","year":"2016"},{"key":"S0218126623502547BIB025","first-page":"1","volume-title":"Proc. 44th Annu. Int. Symp. Computer Architecture","author":"Jouppi N. P.","year":"2017"},{"key":"S0218126623502547BIB027","first-page":"152","volume-title":"Proc. IEEE 25th Annu. Int. Symp. Field-Programmable Custom Computing Machines (FCCM)","author":"Guan Y.","year":"2017"},{"key":"S0218126623502547BIB028","first-page":"1","volume-title":"Proc. FSP Fifth Int. Workshop FPGAs for Software Programmers (VDE)","author":"Noronha D. H.","year":"2018"},{"key":"S0218126623502547BIB029","first-page":"16","volume-title":"Proc. ACM\/SIGDA Int. Symp. Field-Programmable Gate Arrays","author":"Suda N.","year":"2016"}],"container-title":["Journal of Circuits, Systems and Computers"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.worldscientific.com\/doi\/pdf\/10.1142\/S0218126623502547","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,10,10]],"date-time":"2023-10-10T07:48:18Z","timestamp":1696924098000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.worldscientific.com\/doi\/10.1142\/S0218126623502547"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,4,21]]},"references-count":23,"journal-issue":{"issue":"15","published-print":{"date-parts":[[2023,10]]}},"alternative-id":["10.1142\/S0218126623502547"],"URL":"https:\/\/doi.org\/10.1142\/s0218126623502547","relation":{},"ISSN":["0218-1266","1793-6454"],"issn-type":[{"value":"0218-1266","type":"print"},{"value":"1793-6454","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,4,21]]},"article-number":"2350254"}}