{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,15]],"date-time":"2026-04-15T21:07:47Z","timestamp":1776287267558,"version":"3.50.1"},"reference-count":8,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2017,1,11]],"date-time":"2017-01-11T00:00:00Z","timestamp":1484092800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["SIGARCH Comput. Archit. News"],"published-print":{"date-parts":[[2017,1,11]]},"abstract":"<jats:p>Deep Q-learning (DQN) is a recently proposed reinforcement learning algorithm where a neural network is applied as a non-linear approximator to its value function. The exploitation-exploration mechanism allows the training and prediction of the NN to execute simultaneously in an agent during its interaction with the environment. Agents often act independently on battery power, so the training and prediction must occur within the agent and on a limited power budget. In this work, We propose an FPGA acceleration system design for Neural Network Q-learning (NNQL). Our proposed system has high flexibility due to the support to run-time network parameterization, which allows neuroevolution algorithms to dynamically restructure the network to achieve better learning results. Additionally, the power consumption of our proposed system is adaptive to the network size because of a new processing element design. Based on our test cases on networks with hidden layer size ranging from 32 to 16384, our proposed system achieves 7x to 346x speedup compared to GPU implementation and 22x to 77x speedup to hand-coded CPU counterpart.<\/jats:p>","DOI":"10.1145\/3039902.3039915","type":"journal-article","created":{"date-parts":[[2017,1,17]],"date-time":"2017-01-17T13:42:08Z","timestamp":1484660528000},"page":"68-73","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":40,"title":["Neural Network Based Reinforcement Learning Acceleration on FPGA Platforms"],"prefix":"10.1145","volume":"44","author":[{"given":"Jiang","family":"Su","sequence":"first","affiliation":[{"name":"Imperial College London"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jianxiong","family":"Liu","sequence":"additional","affiliation":[{"name":"Imperial College London"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"David B.","family":"Thomas","sequence":"additional","affiliation":[{"name":"Imperial College London"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Peter Y.K.","family":"Cheung","sequence":"additional","affiliation":[{"name":"Imperial College London"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2017,1,11]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"NIPS","author":"Bastien F.","year":"2012"},{"key":"e_1_2_1_2_1","unstructured":"A. Karpathy etal Convnetjs deep q learning demo. http:\/\/cs.stanford.edu\/people\/karpathy\/convnetjs\/.  A. Karpathy et al. Convnetjs deep q learning demo. http:\/\/cs.stanford.edu\/people\/karpathy\/convnetjs\/."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/FPL.2009.5272262"},{"key":"e_1_2_1_4_1","volume-title":"Nature","author":"Mnih V.","year":"2015"},{"key":"e_1_2_1_5_1","volume-title":"Nature","author":"Runekhart D. E.","year":"1986"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/CEC.2003.1299410"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2016.23"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/FPL.2010.25"}],"container-title":["ACM SIGARCH Computer Architecture News"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3039902.3039915","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3039902.3039915","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T03:36:31Z","timestamp":1750217791000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3039902.3039915"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,1,11]]},"references-count":8,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2017,1,11]]}},"alternative-id":["10.1145\/3039902.3039915"],"URL":"https:\/\/doi.org\/10.1145\/3039902.3039915","relation":{},"ISSN":["0163-5964"],"issn-type":[{"value":"0163-5964","type":"print"}],"subject":[],"published":{"date-parts":[[2017,1,11]]},"assertion":[{"value":"2017-01-11","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}