{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T14:57:33Z","timestamp":1740149853231,"version":"3.37.3"},"reference-count":13,"publisher":"Wiley","license":[{"start":{"date-parts":[[2018,7,2]],"date-time":"2018-07-02T00:00:00Z","timestamp":1530489600000},"content-version":"unspecified","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["91648116","2017YFC1500601"],"award-info":[{"award-number":["91648116","2017YFC1500601"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"National Key R&D Program of China","award":["91648116","2017YFC1500601"],"award-info":[{"award-number":["91648116","2017YFC1500601"]}]},{"name":"Xilinx University Program","award":["91648116","2017YFC1500601"],"award-info":[{"award-number":["91648116","2017YFC1500601"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["International Journal of Reconfigurable Computing"],"published-print":{"date-parts":[[2018,7,2]]},"abstract":"<jats:p>CPU has insufficient resources to satisfy the efficient computation of the convolution neural network (CNN), especially for embedded applications. Therefore, heterogeneous computing platforms are widely used to accelerate CNN tasks, such as GPU, FPGA, and ASIC. Among these, FPGA can accelerate the computation by mapping the algorithm to the parallel hardware instead of CPU, which cannot fully exploit the parallelism. By fully using the parallelism of the neural network\u2019s structure, FPGA can reduce the computing costs and increase the computing speed. However, the development of FPGA requires great design skills. As a heterogeneous development platform, OpenCL has some advantages such as high abstraction level, short development cycle, and strong portability, which can make up for the lack of skilled designers. This paper uses Xilinx SDAccel to realize the parallel acceleration of CNN task, and it also proposes an optimizing strategy of single convolutional layer to accelerate CNN. Simulation results show that the calculation speed could be improved by adopting the proposed optimizing strategy. Compared with the baseline design, the strategy of single convolutional layer could increase the computing speed 14 times. Performance of the whole CNN task could be improved 2 times more than before, and the speed of image classification could attain more than 48 fps.<\/jats:p>","DOI":"10.1155\/2018\/1785892","type":"journal-article","created":{"date-parts":[[2018,7,2]],"date-time":"2018-07-02T19:42:44Z","timestamp":1530560564000},"page":"1-10","source":"Crossref","is-referenced-by-count":7,"title":["Design of FPGA-Based Accelerator for Convolutional Neural Network under Heterogeneous Computing Framework with OpenCL"],"prefix":"10.1155","volume":"2018","author":[{"given":"Li","family":"Luo","sequence":"first","affiliation":[{"name":"Department of Electronic Science and Technology, Beijing Jiaotong University, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yakun","family":"Wu","sequence":"additional","affiliation":[{"name":"Department of Electronic Science and Technology, Beijing Jiaotong University, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5054-9590","authenticated-orcid":true,"given":"Fei","family":"Qiao","sequence":"additional","affiliation":[{"name":"Department of Electronic Engineering, Tsinghua University, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yi","family":"Yang","sequence":"additional","affiliation":[{"name":"Department of Electronic Engineering, Tsinghua University, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9882-247X","authenticated-orcid":true,"given":"Qi","family":"Wei","sequence":"additional","affiliation":[{"name":"Department of Electronic Engineering, Tsinghua University, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xiaobo","family":"Zhou","sequence":"additional","affiliation":[{"name":"Department of Electronic Science and Technology, Beijing Jiaotong University, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4537-4650","authenticated-orcid":true,"given":"Yongkai","family":"Fan","sequence":"additional","affiliation":[{"name":"China University of Petroleum, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shuzheng","family":"Xu","sequence":"additional","affiliation":[{"name":"Department of Electronic Engineering, Tsinghua University, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8252-8367","authenticated-orcid":true,"given":"Xinjun","family":"Liu","sequence":"additional","affiliation":[{"name":"Department of Mechanical Engineering, Tsinghua University, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Huazhong","family":"Yang","sequence":"additional","affiliation":[{"name":"Department of Electronic Engineering, Tsinghua University, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"311","reference":[{"key":"2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2012.59"},{"key":"3","doi-asserted-by":"publisher","DOI":"10.1109\/5.726791"},{"key":"4","first-page":"1097","volume-title":"Imagenet classification with deep convolutional neural networks","volume":"60","year":"2017"},{"key":"5","doi-asserted-by":"publisher","DOI":"10.1145\/3065386"},{"key":"7","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-11179-7_51"},{"year":"2002","key":"8"},{"key":"12","doi-asserted-by":"publisher","DOI":"10.1155\/2015\/139238"},{"key":"17","doi-asserted-by":"publisher","DOI":"10.1109\/MCSE.2010.69"},{"journal-title":"The Khronos OpenCL Working Group","year":"2011","key":"18"},{"volume":"2017","journal-title":"International Journal of Reconfigurable Computing","year":"2017","key":"19"},{"volume-title":"Implementing FPGA design with the OpenCL standard","year":"2011","key":"22"},{"year":"2015","key":"25"},{"year":"2007","key":"26"}],"container-title":["International Journal of Reconfigurable Computing"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/downloads.hindawi.com\/journals\/ijrc\/2018\/1785892.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/downloads.hindawi.com\/journals\/ijrc\/2018\/1785892.xml","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/downloads.hindawi.com\/journals\/ijrc\/2018\/1785892.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2018,7,2]],"date-time":"2018-07-02T19:42:51Z","timestamp":1530560571000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.hindawi.com\/journals\/ijrc\/2018\/1785892\/"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,7,2]]},"references-count":13,"alternative-id":["1785892","1785892"],"URL":"https:\/\/doi.org\/10.1155\/2018\/1785892","relation":{},"ISSN":["1687-7195","1687-7209"],"issn-type":[{"type":"print","value":"1687-7195"},{"type":"electronic","value":"1687-7209"}],"subject":[],"published":{"date-parts":[[2018,7,2]]}}}