{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T07:19:04Z","timestamp":1760080744313,"version":"3.41.0"},"reference-count":33,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2023,4,19]],"date-time":"2023-04-19T00:00:00Z","timestamp":1681862400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["61971031, 61671056"],"award-info":[{"award-number":["61971031, 61671056"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Scientific and Technological Innovation Foundation of Shunde Graduate School, USTB","award":["06500093, BK19AF007, BK20BF009"],"award-info":[{"award-number":["06500093, BK19AF007, BK20BF009"]}]},{"name":"Interdisciplinary research project of USTB","award":["FRF-IDRY-19-019"],"award-info":[{"award-number":["FRF-IDRY-19-019"]}]},{"DOI":"10.13039\/501100012226","name":"Fundamental Research Funds for the Central Universities","doi-asserted-by":"crossref","award":["FRF-GF-19-018B"],"award-info":[{"award-number":["FRF-GF-19-018B"]}],"id":[{"id":"10.13039\/501100012226","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Foshan Higher Education Foundation","award":["BKBS202203"],"award-info":[{"award-number":["BKBS202203"]}]},{"name":"MAGICOM Platform of Beijing Advanced Innovation Center for Materials Genome Engineering"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Embed. Comput. Syst."],"published-print":{"date-parts":[[2023,5,31]]},"abstract":"<jats:p>\n            Neural network processors and accelerators are domain-specific architectures deployed to solve the high computational requirements of deep learning algorithms. This article proposes a new instruction set extension for tensor computing, TCX, using Reduced Instruction Set Computer (RISC) instructions enhanced with variable length tensor extensions. It features a multi-dimensional register file, dimension registers, and fully generic tensor instructions. It can be seamlessly integrated into existing RISC Instruction Set Architectures and provides software compatibility for scalable hardware implementations. We present a tensor accelerator implementation of the tensor extensions using an out-of-order RISC microarchitecture. The tensor accelerator is scalable in computation units from several hundred to tens of thousands. An optimized register renaming mechanism is described that allows for many physical tensor registers without requiring architectural support for large tensor register names. We describe new tensor load and store instructions that reduce bandwidth requirements using tensor dimension registers. Implementations may balance data bandwidth and computation utilization for different types of tensor computations such as element-wise, depthwise, and matrix-multiplication. We characterize the computation precision of tensor operations to balance area, generality, and accuracy loss for several well-known neural networks. The TCX processor runs at 1 GHz and sustains 8.2 Tera operations per second using a 4,096 multiply-accumulate compute unit. It consumes 12.8 mm\n            <jats:sup>2<\/jats:sup>\n            while dissipating 0.46W\/TOPs in TSMC 28-nm technology.\n          <\/jats:p>","DOI":"10.1145\/3568310","type":"journal-article","created":{"date-parts":[[2022,10,19]],"date-time":"2022-10-19T04:29:32Z","timestamp":1666153772000},"page":"1-27","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["TCX: A RISC Style Tensor Computing Extension and a Programmable Tensor Processor"],"prefix":"10.1145","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7643-912X","authenticated-orcid":false,"given":"Tailin","family":"Liang","sequence":"first","affiliation":[{"name":"University of Science and Technology Beijing, China and Hua Xia General Processor Technologies, Haidian Qu, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1865-8153","authenticated-orcid":false,"given":"Lei","family":"Wang","sequence":"additional","affiliation":[{"name":"University of Science and Technology Beijing, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4172-8906","authenticated-orcid":false,"given":"Shaobo","family":"Shi","sequence":"additional","affiliation":[{"name":"University of Science and Technology Beijing, China and Hua Xia General Processor Technologies, Haidian Qu, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0713-2105","authenticated-orcid":false,"given":"John","family":"Glossner","sequence":"additional","affiliation":[{"name":"University of Science and Technology Beijing, China and General Processor Technologies, New York, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5122-3998","authenticated-orcid":false,"given":"Xiaotong","family":"Zhang","sequence":"additional","affiliation":[{"name":"University of Science and Technology Beijing, Beijing, China"}]}],"member":"320","published-online":{"date-parts":[[2023,4,19]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2017.2761740"},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2016.42"},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1109\/WCSP49889.2020.9299736"},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.eng.2020.01.007"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1145\/2749469.2750389"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2017.2682138"},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISSCC19947.2020.9063078"},{"key":"e_1_3_1_9_2","unstructured":"Wang Hsiangkai Chen Zakk Cheng Kito Hsu Yi-Hsiu Ibanez Roger Knight Nick and Xing Mingjie. RISC-V Vector Extension Intrinsic Document. Retrieved from https:\/\/github.com\/riscv\/rvv-intrinsic-doc."},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2017.35"},{"key":"e_1_3_1_11_2","article-title":"Vector processor configured to operate on variable length vectors using implicitly typed instructions","author":"Moudgill Mayan","year":"2018","unstructured":"Mayan Moudgill, C. John Glossner, Arthur Joseph Hoane, Paul Hurtley, and Vitaly Kalashnikov. 2018. Vector processor configured to operate on variable length vectors using implicitly typed instructions. Patent No. 9,959,246.","journal-title":"Patent No. 9,959,246"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.521"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2021.07.045"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00232"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPEC43674.2020.9286149"},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSI.2019.2942529"},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/TVLSI.2017.2688340"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSSC.2016.2616357"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/JETCAS.2019.2910232"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/GlobalSIP.2015.7418430"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISSCC42614.2022.9731757"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1145\/2996864"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1145\/2541940.2541967"},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2014.58"},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1145\/2694344.2694358"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2019.2928962"},{"key":"e_1_3_1_27_2","first-page":"579","volume-title":"Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI\u201918)","author":"Chen Tianqi","year":"2018","unstructured":"Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Meghan Cowan, Haichen Shen, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. TVM: An automated end-to-end optimizing compiler for deep learning. In Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI\u201918). 579\u2013594. http:\/\/arxiv.org\/abs\/1802.04799."},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1147\/rd.111.0025"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSI.2018.2840092"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSI.2017.2735490"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSI.2017.2757036"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSSC.2017.2764045"},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/IJCNN.2017.7966159"},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSI.2015.2495722"}],"container-title":["ACM Transactions on Embedded Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3568310","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3568310","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T17:51:33Z","timestamp":1750182693000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3568310"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,4,19]]},"references-count":33,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2023,5,31]]}},"alternative-id":["10.1145\/3568310"],"URL":"https:\/\/doi.org\/10.1145\/3568310","relation":{},"ISSN":["1539-9087","1558-3465"],"issn-type":[{"type":"print","value":"1539-9087"},{"type":"electronic","value":"1558-3465"}],"subject":[],"published":{"date-parts":[[2023,4,19]]},"assertion":[{"value":"2022-02-12","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-10-10","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-04-19","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}