{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,2]],"date-time":"2026-07-02T05:20:00Z","timestamp":1782969600982,"version":"3.54.5"},"reference-count":65,"publisher":"Springer Science and Business Media LLC","issue":"9","license":[{"start":{"date-parts":[[2025,9,8]],"date-time":"2025-09-08T00:00:00Z","timestamp":1757289600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,9,8]],"date-time":"2025-09-08T00:00:00Z","timestamp":1757289600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100002347","name":"Bundesministerium f\u00fcr Bildung und Forschung","doi-asserted-by":"publisher","award":["Neurotec II (Project number: 16ME0398K)"],"award-info":[{"award-number":["Neurotec II (Project number: 16ME0398K)"]}],"id":[{"id":"10.13039\/501100002347","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100002347","name":"Bundesministerium f\u00fcr Bildung und Forschung","doi-asserted-by":"publisher","award":["Project number: 16ME0398K"],"award-info":[{"award-number":["Project number: 16ME0398K"]}],"id":[{"id":"10.13039\/501100002347","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Nat Comput Sci"],"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Transformer networks, driven by self-attention, are central to large language models. In generative transformers, self-attention uses cache memory to store token projections, avoiding recomputation at each time step. However, graphics processing unit (GPU)-stored projections must be loaded into static random-access memory for each new generation step, causing latency and energy bottlenecks. Here we present a custom self-attention in-memory computing architecture based on emerging charge-based memories called gain cells, which can be efficiently written to store new tokens during sequence generation and enable parallel analog dot-product computation required for self-attention. However, the analog gain-cell circuits introduce non-idealities and constraints preventing the direct mapping of pre-trained models. To circumvent this problem, we design an initialization algorithm achieving text-processing performance comparable to GPT-2 without training from scratch. Our architecture reduces attention latency and energy consumption by up to two and four orders of magnitude, respectively, compared with GPUs, marking a substantial step toward ultrafast, low-power generative transformers.<\/jats:p>","DOI":"10.1038\/s43588-025-00854-1","type":"journal-article","created":{"date-parts":[[2025,9,8]],"date-time":"2025-09-08T11:22:48Z","timestamp":1757330568000},"page":"813-824","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":17,"title":["Analog in-memory computing attention mechanism for fast and energy-efficient large language models"],"prefix":"10.1038","volume":"5","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3672-0870","authenticated-orcid":false,"given":"Nathan","family":"Leroux","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6998-3066","authenticated-orcid":false,"given":"Paul-Philipp","family":"Manea","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Chirag","family":"Sudarshan","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4556-3758","authenticated-orcid":false,"given":"Jan","family":"Finkbeiner","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Sebastian","family":"Siegel","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"John Paul","family":"Strachan","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0332-3273","authenticated-orcid":false,"given":"Emre","family":"Neftci","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2025,9,8]]},"reference":[{"key":"854_CR1","unstructured":"Vaswani, A. et al. Attention is all you need. In Proc. 31st International Conference on Neural Information Processing Systems, NIPS\u201917 6000\u20136010 (Curran Associates, 2017)."},{"key":"854_CR2","unstructured":"Bahdanau, D., Cho, K. & Bengio, Y. Neural machine translation by jointly learning to align and translate. Preprint at http:\/\/arxiv.org\/abs\/1409.0473 (2016)."},{"key":"854_CR3","doi-asserted-by":"publisher","first-page":"111","DOI":"10.1016\/j.aiopen.2022.10.001","volume":"3","author":"T Lin","year":"2022","unstructured":"Lin, T., Wang, Y., Liu, X. & Qiu, X. A survey of transformers. AI Open 3, 111\u2013132 (2022).","journal-title":"AI Open"},{"key":"854_CR4","first-page":"606","volume":"5","author":"R Pope","year":"2023","unstructured":"Pope, R. et al. Efficiently scaling transformer inference. Proc. Mach. Learn. Syst. 5, 606\u2013624 (2023).","journal-title":"Proc. Mach. Learn. Syst."},{"key":"854_CR5","unstructured":"Liu, Z. et al. KIVI: a tuning-free asymmetric 2bit quantization for KV cache. In Proc. 41st International Conference on Machine Learning, ICML\u201924 Vol. 235, 32332\u201332344 (JMLR.org, 2024)."},{"key":"854_CR6","unstructured":"Jiang, A.Q. et al. Mistral 7B. Preprint at http:\/\/arxiv.org\/abs\/2310.06825 (2023)."},{"key":"854_CR7","doi-asserted-by":"publisher","unstructured":"Jouppi, N. P. et al. Ten lessons from three generations shaped Google\u2019s TPUv4i: industrial product. In Proc. 2021 ACM\/IEEE 48th Annual International Symposium on Computer Architecture (ISCA) 1\u201314 (IEEE, 2021); https:\/\/doi.org\/10.1109\/ISCA52012.2021.00010","DOI":"10.1109\/ISCA52012.2021.00010"},{"key":"854_CR8","unstructured":"Fu, Y. Challenges in deploying long-context transformers: a theoretical peak performance analysis. Preprint at https:\/\/arxiv.org\/abs\/2405.08944 (2024)."},{"key":"854_CR9","doi-asserted-by":"publisher","first-page":"110","DOI":"10.1145\/3706418","volume":"57","author":"M Xu","year":"2025","unstructured":"Xu, M. et al. Resource-efficient algorithms and systems of foundation models: a survey. ACM Comput. Surv. 57, 110\u2013111039 (2025).","journal-title":"ACM Comput. Surv."},{"key":"854_CR10","doi-asserted-by":"publisher","unstructured":"Katharopoulos, A., Vyas, A., Pappas, N. & Fleuret, F. Transformers are RNNs: fast autoregressive transformers with linear attention. In Proc. 37th International Conference on Machine Learning, ICML\u201920 Vol. 119, 5156\u20135165 (JMLR.org, 2020); https:\/\/doi.org\/10.5555\/3524938.3525416","DOI":"10.5555\/3524938.3525416"},{"key":"854_CR11","unstructured":"Gu, A. & Dao, T. Mamba: linear-time sequence modeling with selective state spaces. In Proc. Conference on Language Modeling (2024); https:\/\/openreview.net\/forum?id=tEYskw1VY2"},{"key":"854_CR12","first-page":"114","volume":"6","author":"M Adnan","year":"2024","unstructured":"Adnan, M. et al. Keyformer: KV cache reduction through key tokens selection for efficient generative inference. Proc. Mach. Learn. Syst. 6, 114\u2013127 (2024).","journal-title":"Proc. Mach. Learn. Syst."},{"key":"854_CR13","unstructured":"DeepSeek-AI et al. Deepseek-v3 technical report. Preprint at https:\/\/arxiv.org\/abs\/2412.19437 (2024)"},{"key":"854_CR14","unstructured":"Chang, C.-C. et al. Palu: KV-cache compression with low-rank projection. In Proc. 13th International Conference on Learning Representations (2025); https:\/\/openreview.net\/forum?id=LWMS4pk2vK"},{"key":"854_CR15","doi-asserted-by":"publisher","unstructured":"Ainslie, J. et al. GQA: training generalized multi-query transformer models from multi-head checkpoints. In Proc. 2023 Conference on Empirical Methods in Natural Language Processing (eds Bouamor, H. et al.) 4895\u20134901 (Association for Computational Linguistics, 2023); https:\/\/doi.org\/10.18653\/v1\/2023.emnlp-main.298","DOI":"10.18653\/v1\/2023.emnlp-main.298"},{"key":"854_CR16","unstructured":"Vogginger, B. et al. Neuromorphic hardware for sustainable AI data centers. Preprint at https:\/\/arxiv.org\/abs\/2402.02521 (2024)."},{"key":"854_CR17","doi-asserted-by":"publisher","unstructured":"Yang, X., Yan, B., Li, H., Chen, Y. ReTransformer: ReRAM-based processing-in-memory architecture for transformer acceleration. In Proc. 39th International Conference on Computer-Aided Design, ICCAD \u201920 92 (Association for Computing Machinery, 2020); https:\/\/doi.org\/10.1145\/3400302.3415640","DOI":"10.1145\/3400302.3415640"},{"key":"854_CR18","doi-asserted-by":"publisher","first-page":"847069","DOI":"10.3389\/felec.2022.847069","volume":"3","author":"AF Laguna","year":"2022","unstructured":"Laguna, A. F. Hardware\u2013software co-design of an in-memory transformer network accelerator. Front. Electron. 3, 847069 (2022).","journal-title":"Front. Electron."},{"key":"854_CR19","doi-asserted-by":"publisher","first-page":"1223","DOI":"10.1109\/TVLSI.2023.3282046","volume":"31","author":"S Sridharan","year":"2023","unstructured":"Sridharan, S., Stevens, J. R., Roy, K. & Raghunathan, A. X-former: in-memory acceleration of transformers. IEEE Trans. Very Large Scale Integr. VLSI Syst. 31, 1223\u20131233 (2023).","journal-title":"IEEE Trans. Very Large Scale Integr. VLSI Syst."},{"key":"854_CR20","doi-asserted-by":"publisher","first-page":"592","DOI":"10.1109\/TCAD.2024.3435762","volume":"44","author":"A Bhattacharjee","year":"2025","unstructured":"Bhattacharjee, A., Moitra, A. & Panda, P. Clipformer: key\u2013value clipping of transformers on memristive crossbars for write noise mitigation. IEEE Trans. Comput. Aided Design Integr. Circuits Syst. 44, 592\u2013601 (2025).","journal-title":"IEEE Trans. Comput. Aided Design Integr. Circuits Syst."},{"key":"854_CR21","doi-asserted-by":"publisher","first-page":"4","DOI":"10.1038\/s44335-024-00004-2","volume":"1","author":"Y Wu","year":"2024","unstructured":"Wu, Y., Wang, Z. & Lu, W. D. PIM GPT a hybrid process in memory accelerator for autoregressive transformers. Npj Unconv. Comput. 1, 4 (2024).","journal-title":"Npj Unconv. Comput."},{"key":"854_CR22","doi-asserted-by":"publisher","first-page":"529","DOI":"10.1038\/s41565-020-0655-z","volume":"15","author":"A Sebastian","year":"2020","unstructured":"Sebastian, A., Le Gallo, M., Khaddam-Aljameh, R. & Eleftheriou, E. Memory devices and applications for in-memory computing. Nat. Nanotechnol. 15, 529\u2013544 (2020).","journal-title":"Nat. Nanotechnol."},{"key":"854_CR23","doi-asserted-by":"publisher","unstructured":"Zhou, M., Xu, W., Kang, J. & Rosing, T. TransPIM: a memory-based acceleration via software\u2013hardware co-design for transformer. In Proc. 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA) 1071\u20131085 (IEEE, 2022); https:\/\/doi.org\/10.1109\/HPCA53966.2022.00082","DOI":"10.1109\/HPCA53966.2022.00082"},{"key":"854_CR24","doi-asserted-by":"publisher","first-page":"269","DOI":"10.1109\/TVLSI.2023.3337777","volume":"32","author":"S Liu","year":"2024","unstructured":"Liu, S. et al. HARDSEA: hybrid analog-ReRAM clustering and digital-SRAM in-memory computing accelerator for dynamic sparse self-attention in transformer. IEEE Trans. Very Large Scale Integr. VLSI Syst. 32, 269\u2013282 (2024).","journal-title":"IEEE Trans. Very Large Scale Integr. VLSI Syst."},{"key":"854_CR25","doi-asserted-by":"publisher","first-page":"587","DOI":"10.1109\/JEDS.2023.3265875","volume":"11","author":"N Lepri","year":"2023","unstructured":"Lepri, N. et al. In-memory computing for machine learning and deep learning. IEEE J. Electron Devices Soc. 11, 587\u2013601 (2023).","journal-title":"IEEE J. Electron Devices Soc."},{"key":"854_CR26","doi-asserted-by":"publisher","unstructured":"Wang, Y. et al. An in-memory computing architecture based on two-dimensional semiconductors for multiply\u2013accumulate operations. Nat. Commun. https:\/\/doi.org\/10.1038\/s41467-021-23719-3 (2021).","DOI":"10.1038\/s41467-021-23719-3"},{"key":"854_CR27","doi-asserted-by":"publisher","first-page":"20220071","DOI":"10.1360\/nso\/20220071","volume":"2","author":"S Gou","year":"2023","unstructured":"Gou, S. et al. 2T1C DRAM based on semiconducting MoS2 and semimetallic graphene for in-memory computing. Natl Sci. Open 2, 20220071 (2023).","journal-title":"Natl Sci. Open"},{"key":"854_CR28","doi-asserted-by":"publisher","unstructured":"Shi, M. et al. Counteractive coupling IGZO\/CNT hybrid 2T0C DRAM accelerating RRAM-based computing-in-memory via monolithic 3D integration for edge AI. In Proc. 2023 International Electron Devices Meeting (IEDM) 1\u20134 (IEEE, 2023); https:\/\/doi.org\/10.1109\/IEDM45741.2023.10413876","DOI":"10.1109\/IEDM45741.2023.10413876"},{"key":"854_CR29","doi-asserted-by":"publisher","unstructured":"Belmonte, A. et al. Lowest IOFF <3\u00d710\u221221 A\/\u03bcm in capacitorless DRAM achieved by reactive ion etch of IGZO-TFT. In Proc. 2023 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits) 1\u20132 (IEEE, 2023); https:\/\/doi.org\/10.23919\/VLSITechnologyandCir57934.2023.10185398","DOI":"10.23919\/VLSITechnologyandCir57934.2023.10185398"},{"key":"854_CR30","doi-asserted-by":"publisher","unstructured":"Ye, H. et al. Double-gate W-doped amorphous indium oxide transistors for monolithic 3D capacitorless gain cell eDRAM. In Proc. 2020 IEEE International Electron Devices Meeting (IEDM) 28.3.\u201328.3.4 (IEEE, 2020); https:\/\/doi.org\/10.1109\/IEDM13553.2020.9371981","DOI":"10.1109\/IEDM13553.2020.9371981"},{"key":"854_CR31","doi-asserted-by":"publisher","unstructured":"Raman, S. R. S., Xie, S. & Kulkarni, J. P. Compute-in-eDRAM with backend integrated indium gallium zinc oxide transistors. In Proc. 2021 IEEE International Symposium on Circuits and Systems (ISCAS) 1\u20135 (IEEE, 2021); https:\/\/doi.org\/10.1109\/ISCAS51556.2021.9401798","DOI":"10.1109\/ISCAS51556.2021.9401798"},{"key":"854_CR32","first-page":"5166","volume":"70","author":"W Tang","year":"2023","unstructured":"Tang, W. et al. Low-power and scalable BEOL-compatible IGZO TFT eDRAM-based charge-domain computing. IEEE Trans. Circuits Syst. I 70, 5166\u20135179 (2023).","journal-title":"IEEE Trans. Circuits Syst. I"},{"key":"854_CR33","doi-asserted-by":"publisher","first-page":"24","DOI":"10.1038\/s44287-023-00002-9","volume":"1","author":"A Lu","year":"2024","unstructured":"Lu, A. et al. High-speed emerging memories for AI hardware accelerators. Nat. Rev. Electr. Eng. 1, 24\u201334 (2024).","journal-title":"Nat. Rev. Electr. Eng."},{"key":"854_CR34","doi-asserted-by":"publisher","first-page":"290","DOI":"10.1038\/s41928-019-0270-x","volume":"2","author":"F Cai","year":"2019","unstructured":"Cai, F. et al. A fully integrated reprogrammable memristor\u2013CMOS system for efficient multiply\u2013accumulate operations. Nat. Electron. 2, 290\u2013299 (2019).","journal-title":"Nat. Electron."},{"key":"854_CR35","doi-asserted-by":"publisher","first-page":"504","DOI":"10.1038\/s41586-022-04992-8","volume":"608","author":"W Wan","year":"2022","unstructured":"Wan, W. et al. A compute-in-memory chip based on resistive random-access memory. Nature 608, 504\u2013512 (2022).","journal-title":"Nature"},{"key":"854_CR36","doi-asserted-by":"publisher","first-page":"768","DOI":"10.1038\/s41586-023-06337-5","volume":"620","author":"S Ambrogio","year":"2023","unstructured":"Ambrogio, S. et al. An analog-AI chip for energy-efficient speech recognition and transcription. Nature 620, 768\u2013775 (2023).","journal-title":"Nature"},{"key":"854_CR37","doi-asserted-by":"publisher","unstructured":"Vatalaro, M. et al. A low-voltage, low-power reconfigurable current-mode softmax circuit for analog neural networks. Electronics https:\/\/doi.org\/10.3390\/electronics10091004 (2021).","DOI":"10.3390\/electronics10091004"},{"key":"854_CR38","doi-asserted-by":"crossref","unstructured":"Dube, A., Manea, P., Gibertini, P., Covi, E. & Strachan, J. P. Analog softmax with wide input current range for in-memory computing. In Proc. IEEE International Symposium on Circuits and Systems (ISCAS), paper 2530 (2025).","DOI":"10.1109\/ISCAS56072.2025.11043251"},{"key":"854_CR39","unstructured":"Ma, X. et al. Mega: moving average equipped gated attention. In Proc. 11th International Conference on Learning Representations (2023); https:\/\/openreview.net\/forum?id=qNLe3iq2El"},{"key":"854_CR40","unstructured":"Ramapuram, J. et al. Theory, analysis, and best practices for sigmoid self-attention. In Proc. 13th International Conference on Learning Representations (2025); https:\/\/openreview.net\/forum?id=Zhdhg6n2OG"},{"key":"854_CR41","unstructured":"Beltagy, I., Peters, M. E. & Cohan, A. Longformer: the long-document transformer. Preprint at https:\/\/arxiv.org\/abs\/2004.05150 (2020)."},{"key":"854_CR42","unstructured":"Gu, X. et al. When attention sink emerges in language models: an empirical view. In Proc. 13th International Conference on Learning Representations (2025); https:\/\/openreview.net\/forum?id=78Nn4QJTEN"},{"key":"854_CR43","unstructured":"Fu, Z. et al. Sliding window attention training for efficient large language models. Preprint at https:\/\/arxiv.org\/abs\/2502.18845 (2025)."},{"key":"854_CR44","unstructured":"Gokaslan, A. & Cohen, V. OpenWebText Corpus. GitHub http:\/\/Skylion007.github.io\/OpenWebTextCorpus (2019)."},{"key":"854_CR45","doi-asserted-by":"publisher","first-page":"3329","DOI":"10.1109\/TED.2024.3372938","volume":"71","author":"S Liu","year":"2024","unstructured":"Liu, S. et al. Design guidelines for oxide semiconductor gain cell memory on a logic platform. IEEE Trans. Electron Devices 71, 3329\u20133335 (2024).","journal-title":"IEEE Trans. Electron Devices"},{"key":"854_CR46","doi-asserted-by":"publisher","unstructured":"Subhechha, S. et al. Demonstration of multilevel multiply accumulate operations for AiMC using engineered a-IGZO transistors-based 2T1C gain cell arrays. In Proc. 2023 IEEE International Memory Workshop (IMW) 1\u20134 (IEEE, 2023); https:\/\/doi.org\/10.1109\/IMW56887.2023.10145946","DOI":"10.1109\/IMW56887.2023.10145946"},{"key":"854_CR47","unstructured":"Brown, T. et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 33, 1877\u20131901 (2020)."},{"key":"854_CR48","doi-asserted-by":"crossref","unstructured":"Jacob, B. et al. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 2704\u20132713 (IEEE, 2018).","DOI":"10.1109\/CVPR.2018.00286"},{"key":"854_CR49","unstructured":"Press, O., Smith, N. A. & Lewis, M. Train short, test long: attention with linear biases enables input length extrapolation. In Proc. International Conference on Learning Representations (2022); https:\/\/openreview.net\/forum?id=R8sQPpGCv0"},{"key":"854_CR50","doi-asserted-by":"publisher","unstructured":"Tillet, P., Kung, H. T. & Cox, D. Triton: an intermediate language and compiler for tiled neural network computations. In Proc. 3rd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages, MAPL 2019 10\u201319 (Association for Computing, 2019); https:\/\/doi.org\/10.1145\/3315508.3329973","DOI":"10.1145\/3315508.3329973"},{"key":"854_CR51","unstructured":"Dao, T. FlashAttention-2: faster attention with better parallelism and work partitioning. In Proc. 12th International Conference on Learning Representations (2024); https:\/\/openreview.net\/forum?id=mZn2Xyh9Ec"},{"key":"854_CR52","unstructured":"Mishkin, D. & Matas, J. All you need is a good init. Preprint at https:\/\/arxiv.org\/abs\/1511.06422 (2015)."},{"key":"854_CR53","doi-asserted-by":"publisher","first-page":"1900","DOI":"10.1109\/TED.2024.3360015","volume":"71","author":"N Lepri","year":"2024","unstructured":"Lepri, N., Glukhov, A., Mannocci, P., Porzani, M. & Ielmini, D. Compact modeling and mitigation of parasitics in crosspoint accelerators of neural networks. IEEE Trans, Electron Devices 71, 1900\u20131906 (2024).","journal-title":"IEEE Trans, Electron Devices"},{"key":"854_CR54","unstructured":"Radford, A. et al. Language models are unsupervised multitask learners. OpenAI https:\/\/cdn.openai.com\/better-language-models\/language_models_are_unsupervised_multitask_learners.pdf (2019)."},{"key":"854_CR55","unstructured":"Loshchilov, I. & Hutter, F. Decoupled weight decay regularization. In Proc. International Conference on Learning Representations (2019); https:\/\/openreview.net\/forum?id=Bkg6RiCqY7"},{"key":"854_CR56","unstructured":"Beck, M. et al. xLSTM: extended long short-term memory. In Proc. 38th Annual Conference on Neural Information Processing Systems (2024); https:\/\/openreview.net\/forum?id=ARAxPPIAhq"},{"key":"854_CR57","unstructured":"Clark, P. et al. Think you have solved question answering? Try ARC, the AI2 reasoning challenge. Preprint at https:\/\/arxiv.org\/abs\/1803.05457 (2018)."},{"key":"854_CR58","doi-asserted-by":"publisher","first-page":"99","DOI":"10.1145\/3474381","volume":"64","author":"K Sakaguchi","year":"2021","unstructured":"Sakaguchi, K., Bras, R. L., Bhagavatula, C. & Choi, Y. WinoGrande: an adversarial winograd schema challenge at scale. Commun. ACM 64, 99\u2013106 (2021).","journal-title":"Commun. ACM"},{"key":"854_CR59","doi-asserted-by":"crossref","unstructured":"Zellers, R., Holtzman, A., Bisk, Y., Farhadi, A. & Choi, Y. HellaSwag: can a machine really finish your sentence? In Proc. 57th Annual Meeting of the Association for Computational Linguistics, 4791\u20134800 (ACL, 2019).","DOI":"10.18653\/v1\/P19-1472"},{"key":"854_CR60","doi-asserted-by":"publisher","unstructured":"Paperno, D. et al. The LAMBADA dataset: word prediction requiring a broad discourse context. In Proc. 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (eds Erk, K. & Smith, N. A.) 1525\u20131534 (Association for Computational Linguistics, 2016); https:\/\/doi.org\/10.18653\/v1\/P16-1144","DOI":"10.18653\/v1\/P16-1144"},{"key":"854_CR61","doi-asserted-by":"crossref","unstructured":"Bisk, Y., Zellers, R., Bras, R. L., Gao, J. & Choi, Y. PIQA: reasoning about physical commonsense in natural language. In Proc. 34th AAAI Conference on Artificial Intelligence, 7432\u20137439 (AAAI, 2020).","DOI":"10.1609\/aaai.v34i05.6239"},{"key":"854_CR62","unstructured":"Merity, S., Xiong, C., Bradbury, J. & Socher, R. Pointer sentinel mixture models. In Proc. International Conference on Learning Representations (2017); https:\/\/openreview.net\/forum?id=Byj72udxe"},{"key":"854_CR63","doi-asserted-by":"publisher","unstructured":"Leroux, N. et al. Analog in-memory computing attention mechanism for fast and energy-efficient large language models source data. figshare https:\/\/doi.org\/10.6084\/m9.figshare.27763548 (2025).","DOI":"10.6084\/m9.figshare.27763548"},{"key":"854_CR64","doi-asserted-by":"publisher","unstructured":"Gao, L. et al. A framework for few-shot language model evaluation. Zenodo https:\/\/doi.org\/10.5281\/zenodo.5371628 (2025).","DOI":"10.5281\/zenodo.5371628"},{"key":"854_CR65","doi-asserted-by":"publisher","unstructured":"Leroux, N. et al. GainCellAttention. Zenodo https:\/\/doi.org\/10.5281\/zenodo.15856645 (2025).","DOI":"10.5281\/zenodo.15856645"}],"container-title":["Nature Computational Science"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.nature.com\/articles\/s43588-025-00854-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s43588-025-00854-1","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s43588-025-00854-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,12,22]],"date-time":"2025-12-22T16:07:56Z","timestamp":1766419676000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.nature.com\/articles\/s43588-025-00854-1"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,9,8]]},"references-count":65,"journal-issue":{"issue":"9","published-online":{"date-parts":[[2025,9]]}},"alternative-id":["854"],"URL":"https:\/\/doi.org\/10.1038\/s43588-025-00854-1","relation":{"has-preprint":[{"id-type":"doi","id":"10.21203\/rs.3.rs-5461915\/v1","asserted-by":"object"}]},"ISSN":["2662-8457"],"issn-type":[{"value":"2662-8457","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,9,8]]},"assertion":[{"value":"15 November 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"22 July 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"8 September 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The authors declare no competing interests.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}]}}