{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,27]],"date-time":"2026-03-27T16:25:07Z","timestamp":1774628707327,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":13,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,2,17]],"date-time":"2021-02-17T00:00:00Z","timestamp":1613520000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,2,17]]},"DOI":"10.1145\/3431920.3439293","type":"proceedings-article","created":{"date-parts":[[2021,2,20]],"date-time":"2021-02-20T23:15:47Z","timestamp":1613862947000},"page":"57-67","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":41,"title":["Stratix 10 NX Architecture and Applications"],"prefix":"10.1145","author":[{"given":"Martin","family":"Langhammer","sequence":"first","affiliation":[{"name":"Intel Corporation, Salisbury, United Kingdom"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Eriko","family":"Nurvitadhi","sequence":"additional","affiliation":[{"name":"Intel Corporation, Portland, OR, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Bogdan","family":"Pasca","sequence":"additional","affiliation":[{"name":"Intel Corporation, Toulouse, France"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sergey","family":"Gribok","sequence":"additional","affiliation":[{"name":"Intel Corporation, San Jose, CA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2021,2,17]]},"reference":[{"key":"e_1_3_2_2_1_1","volume-title":"Think Fast: A Tensor Streaming Processor (TSP) for Accelerating Deep Learning Workloads. In 2020 ACM\/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). 145--158","author":"Abts D.","unstructured":"D. Abts , J. Ross , J. Sparling , M. Wong-VanHaren , M. Baker , T. Hawkins , A. Bell , J. Thompson , T. Kahsai , G. Kimmell , J. Hwang , R. Leslie-Hurd , M. Bye , E. R. Creswick , M. Boyd , M. Venigalla , E. Laforge , J. Purdy , P. Kamath , D. Maheshwari , M. Beidler , G. Rosseel , O. Ahmad , G. Gagarin , R. Czekalski , A. Rane , S. Parmar , J. Werner , J. Sproch , A. Macias , and B. Kurtz . 2020 . Think Fast: A Tensor Streaming Processor (TSP) for Accelerating Deep Learning Workloads. In 2020 ACM\/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). 145--158 . D. Abts, J. Ross, J. Sparling, M. Wong-VanHaren, M. Baker, T. Hawkins, A. Bell, J. Thompson, T. Kahsai, G. Kimmell, J. Hwang, R. Leslie-Hurd, M. Bye, E. R. Creswick, M. Boyd, M. Venigalla, E. Laforge, J. Purdy, P. Kamath, D. Maheshwari, M. Beidler, G. Rosseel, O. Ahmad, G. Gagarin, R. Czekalski, A. Rane, S. Parmar, J. Werner, J. Sproch, A. Macias, and B. Kurtz. 2020. Think Fast: A Tensor Streaming Processor (TSP) for Accelerating Deep Learning Workloads. In 2020 ACM\/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). 145--158."},{"key":"e_1_3_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/3242897"},{"key":"e_1_3_2_2_3_1","volume-title":"BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. CoRR","author":"Devlin Jacob","year":"2018","unstructured":"Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . 2018 . BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. CoRR , Vol. abs\/ 1810 .04805 (2018). arxiv: 1810.04805 http:\/\/arxiv.org\/abs\/1810.04805 Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. CoRR, Vol. abs\/1810.04805 (2018). arxiv: 1810.04805 http:\/\/arxiv.org\/abs\/1810.04805"},{"key":"e_1_3_2_2_4_1","volume-title":"2018 ACM\/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). 1--14","author":"Fowers J.","unstructured":"J. Fowers , K. Ovtcharov , M. Papamichael , T. Massengill , M. Liu , D. Lo , S. Alkalay , M. Haselman , L. Adams , M. Ghandi , S. Heil , P. Patel , A. Sapek , G. Weisz , L. Woods , S. Lanka , S. K. Reinhardt , A. M. Caulfield , E. S. Chung , and D. Burger . 2018. A Configurable Cloud-Scale DNN Processor for Real-Time AI . In 2018 ACM\/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). 1--14 . J. Fowers, K. Ovtcharov, M. Papamichael, T. Massengill, M. Liu, D. Lo, S. Alkalay, M. Haselman, L. Adams, M. Ghandi, S. Heil, P. Patel, A. Sapek, G. Weisz, L. Woods, S. Lanka, S. K. Reinhardt, A. M. Caulfield, E. S. Chung, and D. Burger. 2018. A Configurable Cloud-Scale DNN Processor for Real-Time AI. In 2018 ACM\/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). 1--14."},{"key":"e_1_3_2_2_5_1","volume-title":"Article arXiv:2005.04680 (May","author":"Kalamkar Dhiraj","year":"2020","unstructured":"Dhiraj Kalamkar , Evangelos Georganas , Sudarshan Srinivasan , Jianping Chen , Mikhail Shiryaev , and Alexander Heinecke . 2020. Optimizing Deep Learning Recommender Systems' Training On CPU Cluster Architectures. arXiv e-prints , Article arXiv:2005.04680 (May 2020 ), arXiv:2005.04680 pages.arxiv: cs.DC\/2005.04680 Dhiraj Kalamkar, Evangelos Georganas, Sudarshan Srinivasan, Jianping Chen, Mikhail Shiryaev, and Alexander Heinecke. 2020. Optimizing Deep Learning Recommender Systems' Training On CPU Cluster Architectures. arXiv e-prints, Article arXiv:2005.04680 (May 2020), arXiv:2005.04680 pages.arxiv: cs.DC\/2005.04680"},{"key":"e_1_3_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/3289602.3293927"},{"key":"e_1_3_2_2_7_1","volume-title":"SpiderWeb - High Performance FPGA NoC. In 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 115--118","author":"Langhammer M.","unstructured":"M. Langhammer , G. Baeckler , and S. Gribok . 2020 a . SpiderWeb - High Performance FPGA NoC. In 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 115--118 . M. Langhammer, G. Baeckler, and S. Gribok. 2020 a. SpiderWeb - High Performance FPGA NoC. In 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 115--118."},{"key":"e_1_3_2_2_8_1","volume-title":"2020 IEEE 28th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). 84--92","author":"Langhammer M.","unstructured":"M. Langhammer , S. Gribok , and G. Baeckler . 2020 b. High Density 8-Bit Multiplier Systolic Arrays For FPGA . In 2020 IEEE 28th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). 84--92 . M. Langhammer, S. Gribok, and G. Baeckler. 2020 b. High Density 8-Bit Multiplier Systolic Arrays For FPGA. In 2020 IEEE 28th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). 84--92."},{"key":"e_1_3_2_2_9_1","volume-title":"V100 GPU architecture. The world's most advanced data center GPU. Version WP-08608-001_v1. 1. NVIDIA. Aug","author":"Tesla NVIDIA.","year":"2017","unstructured":"Tesla NVIDIA. 2017. V100 GPU architecture. The world's most advanced data center GPU. Version WP-08608-001_v1. 1. NVIDIA. Aug ( 2017 ), 108. Tesla NVIDIA. 2017. V100 GPU architecture. The world's most advanced data center GPU. Version WP-08608-001_v1. 1. NVIDIA. Aug (2017), 108."},{"key":"e_1_3_2_2_10_1","volume-title":"Activation Function Architectures for FPGAs. In 2018 28th International Conference on Field Programmable Logic and Applications (FPL). 43--437","author":"Pasca B.","year":"2018","unstructured":"B. Pasca and M. Langhammer . 2018 . Activation Function Architectures for FPGAs. In 2018 28th International Conference on Field Programmable Logic and Applications (FPL). 43--437 . https:\/\/doi.org\/10.1109\/FPL. 2018 .00015 B. Pasca and M. Langhammer. 2018. Activation Function Architectures for FPGAs. In 2018 28th International Conference on Field Programmable Logic and Applications (FPL). 43--437. https:\/\/doi.org\/10.1109\/FPL.2018.00015"},{"key":"e_1_3_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/FPL.2019.00061"},{"key":"e_1_3_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.23919\/FPL.2017.8056794"},{"key":"e_1_3_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/3289602.3293925"}],"event":{"name":"FPGA '21: The 2021 ACM\/SIGDA International Symposium on Field Programmable Gate Arrays","location":"Virtual Event USA","acronym":"FPGA '21","sponsor":["SIGDA ACM Special Interest Group on Design Automation"]},"container-title":["The 2021 ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3431920.3439293","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3431920.3439293","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T21:31:31Z","timestamp":1750195891000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3431920.3439293"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,2,17]]},"references-count":13,"alternative-id":["10.1145\/3431920.3439293","10.1145\/3431920"],"URL":"https:\/\/doi.org\/10.1145\/3431920.3439293","relation":{},"subject":[],"published":{"date-parts":[[2021,2,17]]},"assertion":[{"value":"2021-02-17","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}