{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,24]],"date-time":"2026-04-24T15:08:06Z","timestamp":1777043286997,"version":"3.51.4"},"reference-count":61,"publisher":"IOP Publishing","issue":"1","license":[{"start":{"date-parts":[[2023,2,24]],"date-time":"2023-02-24T00:00:00Z","timestamp":1677196800000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,2,24]],"date-time":"2023-02-24T00:00:00Z","timestamp":1677196800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/iopscience.iop.org\/info\/page\/text-and-data-mining"}],"funder":[{"name":"PKU-Baidu Fund","award":["Project 2020BD010"],"award-info":[{"award-number":["Project 2020BD010"]}]},{"name":"The 111 Project","award":["B18001"],"award-info":[{"award-number":["B18001"]}]},{"name":"Tencent Foundation through the XPLORER PRIZE"},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["61925401"],"award-info":[{"award-number":["61925401"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100004806","name":"Fok Ying-Tong Education Foundation","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100004806","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["iopscience.iop.org"],"crossmark-restriction":false},"short-container-title":["Neuromorph. Comput. Eng."],"published-print":{"date-parts":[[2023,3,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Deep neural networks (DNNs) are one of the key fields of machine learning. It requires considerable computational resources for cognitive tasks. As a novel technology to perform computing inside\/near memory units, in-memory computing (IMC) significantly improves computing efficiency by reducing the need for repetitive data transfer between the processing and memory units. However, prior IMC designs mainly focus on the acceleration for DNN inference. DNN training with the IMC hardware has rarely been proposed. The challenges lie in the requirement of DNN training for high precision (e.g. floating point (FP)) and various operations of tensors (e.g. inner and outer products). These challenges call for the IMC design with new features. This paper proposes a novel Hadamard product-based IMC design for FP DNN training. Our design consists of multiple compartments, which are the basic units for the matrix element-wise processing. We also develop BFloat16 post-processing circuits and fused adder trees, laying the foundation for IMC FP processing. Based on the proposed circuit scheme, we reformulate the back-propagation training algorithm for the convenience and efficiency of the IMC execution. The proposed design is implemented with commercial 28\u2009nm technology process design kits and benchmarked with widely used neural networks. We model the influence of the circuit structural design parameters and provide an analysis framework for design space exploration. Our simulation validates that MobileNet training with the proposed IMC scheme saves <jats:inline-formula>\n                     <jats:tex-math\/>\n                     <mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\" overflow=\"scroll\">\n                        <mml:mn>91.2<\/mml:mn>\n                        <mml:mi mathvariant=\"normal\">%<\/mml:mi>\n                     <\/mml:math>\n                     <jats:inline-graphic xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" xlink:href=\"nceacbab9ieqn1.gif\" xlink:type=\"simple\"\/>\n                  <\/jats:inline-formula> in energy and <jats:inline-formula>\n                     <jats:tex-math\/>\n                     <mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\" overflow=\"scroll\">\n                        <mml:mn>13.9<\/mml:mn>\n                        <mml:mi mathvariant=\"normal\">%<\/mml:mi>\n                     <\/mml:math>\n                     <jats:inline-graphic xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" xlink:href=\"nceacbab9ieqn2.gif\" xlink:type=\"simple\"\/>\n                  <\/jats:inline-formula> in time versus the same task with NVIDIA GTX 3060 GPU. The proposed IMC design has a data density of 769.2\u2009Kb\u2009mm<jats:sup>\u22122<\/jats:sup> with the FP processing circuits included, showing a 3.5\u2009\u00d7 improvement than the prior FP IMC designs.<\/jats:p>","DOI":"10.1088\/2634-4386\/acbab9","type":"journal-article","created":{"date-parts":[[2023,2,9]],"date-time":"2023-02-09T22:29:00Z","timestamp":1675981740000},"page":"014009","update-policy":"https:\/\/doi.org\/10.1088\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["Hadamard product-based in-memory computing design for floating point neural network training"],"prefix":"10.1088","volume":"3","author":[{"given":"Anjunyi","family":"Fan","sequence":"first","affiliation":[]},{"given":"Yihan","family":"Fu","sequence":"additional","affiliation":[]},{"given":"Yaoyu","family":"Tao","sequence":"additional","affiliation":[]},{"given":"Zhonghua","family":"Jin","sequence":"additional","affiliation":[]},{"given":"Haiyue","family":"Han","sequence":"additional","affiliation":[]},{"given":"Huiyu","family":"Liu","sequence":"additional","affiliation":[]},{"given":"Yaojun","family":"Zhang","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3052-9330","authenticated-orcid":true,"given":"Bonan","family":"Yan","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4674-4059","authenticated-orcid":true,"given":"Yuchao","family":"Yang","sequence":"additional","affiliation":[]},{"given":"Ru","family":"Huang","sequence":"additional","affiliation":[]}],"member":"266","published-online":{"date-parts":[[2023,2,24]]},"reference":[{"key":"nceacbab9bib1","doi-asserted-by":"publisher","first-page":"436","DOI":"10.1038\/nature14539","volume":"521","author":"LeCun","year":"2015","journal-title":"Nature"},{"key":"nceacbab9bib2","first-page":"pp 335","article-title":"Towards energy efficient non-von Neumann architectures for deep learning","author":"Ganguly","year":"2019"},{"key":"nceacbab9bib3","doi-asserted-by":"publisher","first-page":"43","DOI":"10.1109\/MSSC.2019.2922889","volume":"11","author":"Verma","year":"2019","journal-title":"IEEE Solid-State Circuits Mag."},{"key":"nceacbab9bib4","doi-asserted-by":"publisher","first-page":"1123","DOI":"10.1109\/TCAD.2019.2907886","volume":"39","author":"Angizi","year":"2019","journal-title":"IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst."},{"key":"nceacbab9bib5","doi-asserted-by":"publisher","DOI":"10.1002\/advs.202103357","volume":"9","author":"Zhang","year":"2022","journal-title":"Adv. Sci."},{"key":"nceacbab9bib6","first-page":"pp 1865","article-title":"Spinlim: spin orbit torque memory for ternary neural networks based on the logic-in-memory architecture","author":"Luo","year":"2021"},{"key":"nceacbab9bib7","doi-asserted-by":"publisher","first-page":"75","DOI":"10.3390\/make1010005","volume":"1","author":"Mittal","year":"2018","journal-title":"Mach. Learn. Knowl. Extr."},{"key":"nceacbab9bib8","first-page":"pp T86","article-title":"RRAM-based spiking nonvolatile computing-in-memory processing engine with precision-configurable in situ nonlinear activation","author":"Yan","year":"2019"},{"key":"nceacbab9bib9","doi-asserted-by":"publisher","first-page":"137","DOI":"10.1038\/s41928-018-0023-2","volume":"1","author":"Wang","year":"2018","journal-title":"Nat. Electron."},{"key":"nceacbab9bib10","first-page":"pp 1","article-title":"Computing in memory with FeFETs","author":"Reis","year":"2018"},{"key":"nceacbab9bib11","doi-asserted-by":"publisher","first-page":"2094","DOI":"10.1109\/TED.2022.3142239","volume":"69","author":"Aabrar","year":"2022","journal-title":"IEEE Trans. Electron Devices"},{"key":"nceacbab9bib12","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3473461","volume":"18","author":"Luo","year":"2022","journal-title":"ACM J. Emerg. Technol. Comput. Syst."},{"key":"nceacbab9bib13","doi-asserted-by":"publisher","first-page":"1358","DOI":"10.1109\/LED.2019.2928335","volume":"40","author":"Lee","year":"2019","journal-title":"IEEE Electron Device Lett."},{"key":"nceacbab9bib14","doi-asserted-by":"publisher","first-page":"4782","DOI":"10.1109\/TNNLS.2017.2778940","volume":"29","author":"Merrikh-Bayat","year":"2018","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"nceacbab9bib15","first-page":"pp 1","article-title":"Introduction of 3d and-type flash memory and it\u2019s applications to computing-in-memory (cim)","author":"Lue","year":"2021"},{"key":"nceacbab9bib16","first-page":"pp 282","article-title":"SISA: set-centric instruction set architecture for graph mining on processing-in-memory systems","author":"Besta","year":"2021"},{"key":"nceacbab9bib17","doi-asserted-by":"publisher","first-page":"1576","DOI":"10.1109\/TCSII.2021.3069011","volume":"68","author":"Meng","year":"2021","journal-title":"IEEE Trans. Circuits Syst. II"},{"key":"nceacbab9bib18","doi-asserted-by":"publisher","first-page":"607","DOI":"10.1038\/s41586-019-1677-2","volume":"575","author":"Roy","year":"2019","journal-title":"Nature"},{"key":"nceacbab9bib19","doi-asserted-by":"publisher","first-page":"617","DOI":"10.1109\/TC.2018.2879502","volume":"68","author":"Zhao","year":"2018","journal-title":"IEEE Trans. Comput."},{"key":"nceacbab9bib20","first-page":"pp 541","article-title":"Pipelayer: a pipelined ReRAM-based accelerator for deep learning","author":"Song","year":"2017"},{"key":"nceacbab9bib21","doi-asserted-by":"publisher","first-page":"4172","DOI":"10.1109\/TCSI.2019.2928043","volume":"66","author":"Si","year":"2019","journal-title":"IEEE Trans. Circuits Syst. I"},{"key":"nceacbab9bib22","doi-asserted-by":"publisher","first-page":"214","DOI":"10.1109\/TCSI.2022.3216735","volume":"70","author":"Wang","year":"2022","journal-title":"IEEE Trans. Circuits Syst. I"},{"key":"nceacbab9bib23","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1109\/JSSC.2022.3198413","author":"Guo","year":"2022","journal-title":"IEEE J. Solid-State Circuits"},{"key":"nceacbab9bib24","first-page":"pp 250","article-title":"A 28nm 384Kb 6T-SRAM computation-in-memory macro with 8b precision for AI edge chips","volume":"vol 64","author":"Su","year":"2021"},{"key":"nceacbab9bib25","doi-asserted-by":"publisher","first-page":"529","DOI":"10.1038\/s41565-020-0655-z","volume":"15","author":"Sebastian","year":"2020","journal-title":"Nat. Nanotechnol."},{"key":"nceacbab9bib26","doi-asserted-by":"publisher","first-page":"333","DOI":"10.1038\/s41928-018-0092-2","volume":"1","author":"Ielmini","year":"2018","journal-title":"Nat. Electron."},{"key":"nceacbab9bib27","doi-asserted-by":"publisher","first-page":"158","DOI":"10.1109\/JSSC.2018.2869150","volume":"54","author":"Bankman","year":"2019","journal-title":"IEEE J. Solid-State Circuits"},{"key":"nceacbab9bib28","doi-asserted-by":"publisher","first-page":"217","DOI":"10.1109\/JSSC.2018.2880918","volume":"54","author":"Biswas","year":"2019","journal-title":"IEEE J. Solid-State Circuits"},{"key":"nceacbab9bib29","first-page":"pp 496","article-title":"A 65nm 4Kb algorithm-dependent computing-in-memory SRAM unit-macro with 2.3ns and 55.8TOPS\/W fully parallel product-sum operation for binary DNN edge processors","author":"Khwa","year":"2018"},{"key":"nceacbab9bib30","doi-asserted-by":"publisher","first-page":"1789","DOI":"10.1109\/JSSC.2019.2899730","volume":"54","author":"Valavi","year":"2019","journal-title":"IEEE J. Solid-State Circuits"},{"key":"nceacbab9bib31","doi-asserted-by":"publisher","DOI":"10.1002\/aisy.201900068","volume":"1","author":"Yan","year":"2019","journal-title":"Adv. Intell. Syst."},{"key":"nceacbab9bib32","doi-asserted-by":"publisher","first-page":"1773","DOI":"10.1109\/TCSI.2021.3064189","volume":"68","author":"Jhang","year":"2021","journal-title":"IEEE Trans. Circuits Syst. I"},{"key":"nceacbab9bib33","first-page":"p 3.1.1","article-title":"Exploiting hybrid precision for training and inference: a 2T-1FeFET based analog synaptic weight cell","author":"Sun","year":"2018"},{"key":"nceacbab9bib34","first-page":"pp 2704","article-title":"Quantization and training of neural networks for efficient integer-arithmetic-only inference","author":"Jacob","year":"2018"},{"key":"nceacbab9bib35","doi-asserted-by":"publisher","first-page":"3249","DOI":"10.1109\/TPDS.2022.3149787","volume":"33","author":"Wang","year":"2022","journal-title":"IEEE Trans. on Parallel Distrib. Syst."},{"key":"nceacbab9bib36","volume":"vol 31","author":"Banner","year":"2018"},{"key":"nceacbab9bib37","author":"Zamirai","year":"2020"},{"key":"nceacbab9bib38","author":"Micikevicius","year":"2017"},{"key":"nceacbab9bib39","volume":"vol 31","author":"Wang","year":"2018"},{"key":"nceacbab9bib40","author":"Courbariaux","year":"2014"},{"key":"nceacbab9bib41","first-page":"pp 1","article-title":"A 28nm 29.2TFLOPS\/W BF16 and 36.5TOPS\/W INT8 reconfigurable digital CIM processor with unified FP\/INT pipeline and bitwise in-memory booth multiplication for cloud deep learning acceleration","volume":"vol 65","author":"Tu","year":"2022"},{"key":"nceacbab9bib42","doi-asserted-by":"publisher","first-page":"99","DOI":"10.1109\/MM.2021.3096236","volume":"42","author":"Lee","year":"2021","journal-title":"IEEE Micro"},{"key":"nceacbab9bib43","first-page":"pp 1","article-title":"A 1ynm 1.25V 8Gb, 16Gb\/s\/pin GDDR6-based accelerator-in-memory supporting 1TFLOPS MAC operation and various activation functions for deep-learning applications","volume":"vol 65","author":"Lee","year":"2022"},{"key":"nceacbab9bib44","author":"Howard","year":"2017"},{"key":"nceacbab9bib45","author":"Gholamalinezhad","year":"2020"},{"key":"nceacbab9bib46","first-page":"pp 374","article-title":"An 8GHz floating-point multiply","author":"Belluomini","year":"2005"},{"key":"nceacbab9bib47","article-title":"A 1.041Mb\/mm2 27.38TOPS\/W signed-INT8 dynamic logic based ADC-less SRAM compute-in-memory macro in 28nm with reconfigurable bitwise operation for AI and embedded applications","author":"Yan","year":"2022"},{"key":"nceacbab9bib48","author":"Weste","year":"2011","edition":"4th edn"},{"key":"nceacbab9bib49","first-page":"pp 1","article-title":"Lattice: an ADC\/DAC-less ReRAM-based processing-in-memory architecture for accelerating deep convolution neural networks","author":"Zheng","year":"2020"},{"key":"nceacbab9bib50","first-page":"pp 1401","article-title":"Neural networks with digital LUT activation functions","volume":"vol 2","author":"Piazza","year":"1993"},{"key":"nceacbab9bib51","doi-asserted-by":"publisher","first-page":"84","DOI":"10.1145\/3065386","article-title":"ImageNet classification with deep convolutional neural networks","volume":"60","author":"Krizhevsky","year":"2017","journal-title":"Commun. ACM"},{"key":"nceacbab9bib52","first-page":"pp 4510","article-title":"MobileNetV2: inverted residuals and linear bottlenecks","author":"Sandler","year":"2018"},{"key":"nceacbab9bib53","first-page":"pp 10734","article-title":"FBNet: hardware-aware efficient convnet design via differentiable neural architecture search","author":"Wu","year":"2019"},{"key":"nceacbab9bib54","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3476999","volume":"20","author":"Krishnan","year":"2021","journal-title":"ACM Trans. Embedded Comput. Syst."},{"key":"nceacbab9bib55","doi-asserted-by":"publisher","first-page":"3067","DOI":"10.1109\/TCAD.2018.2789723","volume":"37","author":"Chen","year":"2018","journal-title":"IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst."},{"key":"nceacbab9bib56","doi-asserted-by":"publisher","first-page":"1009","DOI":"10.1109\/TCAD.2017.2729466","volume":"37","author":"Xia","year":"2017","journal-title":"IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst."},{"key":"nceacbab9bib57","first-page":"pp 83","article-title":"MNSIM 2.0: a behavior-level modeling tool for memristor-based neuromorphic computing systems","author":"Zhu","year":"2020"},{"key":"nceacbab9bib58","author":"Simonyan","year":"2014"},{"key":"nceacbab9bib59","first-page":"pp 1","article-title":"Going deeper with convolutions","author":"Szegedy","year":"2015"},{"key":"nceacbab9bib60","first-page":"pp 770","article-title":"Deep residual learning for image recognition","author":"He","year":"2016"},{"key":"nceacbab9bib61","first-page":"pp 252","article-title":"An 89TOPS\/W and 16.3TOPS\/mm2 all-digital SRAM-based full-precision compute-in memory macro in 22nm for machine-learning edge applications","volume":"vol 64","author":"Chih","year":"2021"}],"container-title":["Neuromorphic Computing and Engineering"],"original-title":[],"link":[{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2634-4386\/acbab9","content-type":"text\/html","content-version":"am","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2634-4386\/acbab9\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2634-4386\/acbab9","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2634-4386\/acbab9\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2634-4386\/acbab9\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2634-4386\/acbab9\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2634-4386\/acbab9\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"similarity-checking"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2634-4386\/acbab9\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,10,11]],"date-time":"2023-10-11T09:03:55Z","timestamp":1697015035000},"score":1,"resource":{"primary":{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2634-4386\/acbab9"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,2,24]]},"references-count":61,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2023,2,24]]},"published-print":{"date-parts":[[2023,3,1]]}},"URL":"https:\/\/doi.org\/10.1088\/2634-4386\/acbab9","relation":{},"ISSN":["2634-4386"],"issn-type":[{"value":"2634-4386","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,2,24]]},"assertion":[{"value":"Hadamard product-based in-memory computing design for floating point neural network training","name":"article_title","label":"Article Title"},{"value":"Neuromorphic Computing and Engineering","name":"journal_title","label":"Journal Title"},{"value":"paper","name":"article_type","label":"Article Type"},{"value":"\u00a9 2023 The Author(s). Published by IOP Publishing Ltd","name":"copyright_information","label":"Copyright Information"},{"value":"2022-11-17","name":"date_received","label":"Date Received","group":{"name":"publication_dates","label":"Publication dates"}},{"value":"2023-02-09","name":"date_accepted","label":"Date Accepted","group":{"name":"publication_dates","label":"Publication dates"}},{"value":"2023-02-24","name":"date_epub","label":"Online publication date","group":{"name":"publication_dates","label":"Publication dates"}}]}}