{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,13]],"date-time":"2026-06-13T12:47:36Z","timestamp":1781354856833,"version":"3.54.1"},"reference-count":25,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2021,4,5]],"date-time":"2021-04-05T00:00:00Z","timestamp":1617580800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["J. Emerg. Technol. Comput. Syst."],"published-print":{"date-parts":[[2021,4,30]]},"abstract":"<jats:p>Reinforcement learning, augmented by the representational power of deep neural networks, has shown promising results on high-dimensional problems, such as game playing and robotic control. However, the sequential nature of these problems poses a fundamental challenge for computational efficiency. Recently, alternative approaches such as evolutionary strategies and deep neuroevolution demonstrated competitive results with faster training time on distributed CPU cores. Here we report record training times (running at about 1 million frames per second) for Atari 2600 games using deep neuroevolution implemented on distributed FPGAs. Combined hardware implementation of the game console, image preprocessing and the neural network in an optimized pipeline, multiplied with the system level parallelism enabled the acceleration. These results are the first application demonstration on the IBM Neural Computer, which is a custom designed system that consists of 432 Xilinx FPGAs interconnected in a 3D mesh network topology. In addition to high performance, experiments also showed improvement in accuracy for all games compared to the CPU implementation of the same algorithm.<\/jats:p>","DOI":"10.1145\/3425500","type":"journal-article","created":{"date-parts":[[2021,4,5]],"date-time":"2021-04-05T10:08:11Z","timestamp":1617617291000},"page":"1-17","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["Accelerating Deep Neuroevolution on Distributed FPGAs for Reinforcement Learning Problems"],"prefix":"10.1145","volume":"17","author":[{"given":"Alexis","family":"Asseman","sequence":"first","affiliation":[{"name":"IBM Almaden Research Center, California, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Nicolas","family":"Antoine","sequence":"additional","affiliation":[{"name":"IBM Almaden Research Center, California, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Ahmet S.","family":"Ozcan","sequence":"additional","affiliation":[{"name":"IBM Almaden Research Center, California, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2021,4,5]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Stella Programmer\u2019s Guide. Retrieved","author":"Wright Steve","year":"2019","unstructured":"Steve Wright . 1979. Stella Programmer\u2019s Guide. Retrieved October 4, 2019 from https:\/\/alienbill.com\/2600\/101\/docs\/stella.html. Steve Wright. 1979. Stella Programmer\u2019s Guide. Retrieved October 4, 2019 from https:\/\/alienbill.com\/2600\/101\/docs\/stella.html."},{"key":"e_1_2_1_2_1","volume-title":"Retrieved","author":"Union International Communications","year":"2011","unstructured":"International Communications Union . 2011 . Studio Encoding Parameters of Digital Television for Standard 4:3 and Wide Screen 16:9 Aspect Ratios . Retrieved February 22, 2021 from https:\/\/www.itu.int\/rec\/R-REC-BT.601\/ International Communications Union. 2011. Studio Encoding Parameters of Digital Television for Standard 4:3 and Wide Screen 16:9 Aspect Ratios. Retrieved February 22, 2021 from https:\/\/www.itu.int\/rec\/R-REC-BT.601\/"},{"key":"e_1_2_1_3_1","volume-title":"Miles Brundage, and Anil Anthony Bharath.","author":"Arulkumaran Kai","year":"2017","unstructured":"Kai Arulkumaran , Marc Peter Deisenroth , Miles Brundage, and Anil Anthony Bharath. 2017 . A brief survey of deep reinforcement learning. arXiv:1708.05866 Kai Arulkumaran, Marc Peter Deisenroth, Miles Brundage, and Anil Anthony Bharath. 2017. A brief survey of deep reinforcement learning. arXiv:1708.05866"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.5555\/2566972.2566979"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/3297858.3304058"},{"key":"e_1_2_1_6_1","unstructured":"Matthieu Courbariaux Yoshua Bengio and Jean-Pierre David. 2014. Training deep neural networks with low precision multiplications. arxiv:cs.LG\/1412.7024  Matthieu Courbariaux Yoshua Bengio and Jean-Pierre David. 2014. Training deep neural networks with low precision multiplications. arxiv:cs.LG\/1412.7024"},{"key":"e_1_2_1_7_1","volume-title":"Proceedings of the 13th International Conference on Artificial Intelligence and Statistics. 249--256","author":"Glorot Xavier","year":"2010","unstructured":"Xavier Glorot and Yoshua Bengio . 2010 . Understanding the difficulty of training deep feedforward neural networks . In Proceedings of the 13th International Conference on Artificial Intelligence and Statistics. 249--256 . Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the 13th International Conference on Artificial Intelligence and Statistics. 249--256."},{"key":"e_1_2_1_8_1","volume-title":"Proceedings of the 32nd AAAI Conference on Artificial Intelligence.","author":"Hessel Matteo","year":"2018","unstructured":"Matteo Hessel , Joseph Modayil , Hado Van Hasselt , Tom Schaul , Georg Ostrovski , Will Dabney , Dan Horgan , Bilal Piot , Mohammad Azar , and David Silver . 2018 . Rainbow: Combining improvements in deep reinforcement learning . In Proceedings of the 32nd AAAI Conference on Artificial Intelligence. Matteo Hessel, Joseph Modayil, Hado Van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar, and David Silver. 2018. Rainbow: Combining improvements in deep reinforcement learning. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence."},{"key":"e_1_2_1_9_1","volume-title":"Charles Beattie, et\u00a0al.","author":"Jaderberg Max","year":"2019","unstructured":"Max Jaderberg , Wojciech M. Czarnecki , Iain Dunning , Luke Marris , Guy Lever , Antonio Garcia Castaneda , Charles Beattie, et\u00a0al. 2019 . Human-level performance in 3D multiplayer games with population-based reinforcement learning. Science 364, 6443 (2019), 859--865. Max Jaderberg, Wojciech M. Czarnecki, Iain Dunning, Luke Marris, Guy Lever, Antonio Garcia Castaneda, Charles Beattie, et\u00a0al. 2019. Human-level performance in 3D multiplayer games with population-based reinforcement learning. Science 364, 6443 (2019), 859--865."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/TG.2019.2896986"},{"key":"e_1_2_1_11_1","unstructured":"Yuxi Li. 2017. Deep reinforcement learning: An overview. arXiv:1701.07274  Yuxi Li. 2017. Deep reinforcement learning: An overview. arXiv:1701.07274"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.5555\/3241691.3241702"},{"key":"e_1_2_1_13_1","volume-title":"Proceedings of the International Conference on Machine Learning. 1928--1937","author":"Mnih Volodymyr","year":"2016","unstructured":"Volodymyr Mnih , Adria Puigdomenech Badia , Mehdi Mirza , Alex Graves , Timothy Lillicrap , Tim Harley , David Silver , and Koray Kavukcuoglu . 2016 . Asynchronous methods for deep reinforcement learning . In Proceedings of the International Conference on Machine Learning. 1928--1937 . Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. 2016. Asynchronous methods for deep reinforcement learning. In Proceedings of the International Conference on Machine Learning. 1928--1937."},{"key":"e_1_2_1_14_1","volume-title":"et\u00a0al","author":"Mnih Volodymyr","year":"2015","unstructured":"Volodymyr Mnih , Koray Kavukcuoglu , David Silver , Andrei A. Rusu , Joel Veness , Marc G. Bellemare , Alex Graves , et\u00a0al . 2015 . Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529. Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, et\u00a0al. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529."},{"key":"e_1_2_1_15_1","volume-title":"Ozcan","author":"Narayanan Pritish","year":"2020","unstructured":"Pritish Narayanan , Charles E. Cox , Alexis Asseman , Nicolas Antoine , Harald Huels , Winfried W. Wilcke , and Ahmet S . Ozcan . 2020 . Overview of the IBM neural computer architecture. arXiv:2003.11178 Pritish Narayanan, Charles E. Cox, Alexis Asseman, Nicolas Antoine, Harald Huels, Winfried W. Wilcke, and Ahmet S. Ozcan. 2020. Overview of the IBM neural computer architecture. arXiv:2003.11178"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10846-017-0468-y"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.5594\/J06718"},{"key":"e_1_2_1_18_1","unstructured":"Felipe Petroski Such Vashisht Madhavan Edoardo Conti Joel Lehman Kenneth O. Stanley and Jeff Clune. 2017. Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning. arXiv:1712.06567  Felipe Petroski Such Vashisht Madhavan Edoardo Conti Joel Lehman Kenneth O. Stanley and Jeff Clune. 2017. Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning. arXiv:1712.06567"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/MWSCAS.2004.1354049"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00034-019-01037-w"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/3020078.3021744"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/3061639.3062207"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1038\/s41928-018-0059-3"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/3240765.3240801"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/SSCI.2016.7849837"}],"container-title":["ACM Journal on Emerging Technologies in Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3425500","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3425500","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T21:31:55Z","timestamp":1750195915000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3425500"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,4,5]]},"references-count":25,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2021,4,30]]}},"alternative-id":["10.1145\/3425500"],"URL":"https:\/\/doi.org\/10.1145\/3425500","relation":{},"ISSN":["1550-4832","1550-4840"],"issn-type":[{"value":"1550-4832","type":"print"},{"value":"1550-4840","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,4,5]]},"assertion":[{"value":"2020-04-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-09-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-04-05","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}