{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T01:09:55Z","timestamp":1760058595437,"version":"build-2065373602"},"reference-count":61,"publisher":"Association for Computing Machinery (ACM)","issue":"OOPSLA1","license":[{"start":{"date-parts":[[2025,4,9]],"date-time":"2025-04-09T00:00:00Z","timestamp":1744156800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Semiconductor Research Corporation, ACE one of the seven centers in JUMP 2.0","award":["CCF-2316233"],"award-info":[{"award-number":["CCF-2316233"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Program. Lang."],"published-print":{"date-parts":[[2025,4,9]]},"abstract":"<jats:p>\n            Multi-head-self-attention (MHSA) mechanisms achieve state-of-the-art (SOTA) performance across natural language processing and vision tasks. However, their quadratic dependence on sequence lengths has bottlenecked inference speeds. To circumvent this bottleneck, researchers have proposed various sparse-MHSA models, where a subset of full attention is computed. Despite their promise, current sparse libraries and compilers do not support high-performance implementations for\n            <jats:italic toggle=\"yes\">diverse<\/jats:italic>\n            sparse-MHSA patterns due to the underlying sparse formats they operate on. On one end, sparse libraries operate on\n            <jats:italic toggle=\"yes\">general sparse formats<\/jats:italic>\n            which target extreme amounts of random sparsity (&lt;10% non-zero values) and have high metadata in\n            <jats:italic toggle=\"yes\">O<\/jats:italic>\n            (\n            <jats:italic toggle=\"yes\">nnzs<\/jats:italic>\n            ). On the other end, hand-written kernels operate on\n            <jats:italic toggle=\"yes\">custom sparse formats<\/jats:italic>\n            which target specific sparse-MHSA patterns. However, the sparsity patterns in sparse-MHSA are moderately sparse (10-50% non-zero values) and varied, resulting in general sparse formats incurring high metadata overhead and custom sparse formats covering few sparse-MSHA patterns, trading off generality for performance.   We bridge this gap, achieving both generality and performance, by proposing a novel sparse format: affine-compressed-sparse-row (ACSR) and supporting code-generation scheme, SPLAT, that generates high-performance implementations for diverse sparse-MHSA patterns on GPUs. Core to our proposed format and code generation algorithm is the observation that common sparse-MHSA patterns have uniquely regular geometric properties. These properties, which can be analyzed just-in-time, expose novel optimizations and tiling strategies that SPLAT exploits to generate high-performance implementations for diverse patterns. To demonstrate SPLAT\u2019s efficacy, we use it to generate code for various sparse-MHSA models, achieving speedups of up-to 2.05x and 4.05x over hand-written kernels written in triton and TVM respectively on A100 GPUs in single-precision.\n          <\/jats:p>","DOI":"10.1145\/3720503","type":"journal-article","created":{"date-parts":[[2025,4,9]],"date-time":"2025-04-09T13:48:26Z","timestamp":1744206506000},"page":"1632-1660","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["SPLAT: A Framework for Optimised GPU Code-Generation for SParse reguLar ATtention"],"prefix":"10.1145","volume":"9","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2664-8545","authenticated-orcid":false,"given":"Ahan","family":"Gupta","sequence":"first","affiliation":[{"name":"University of Illinois at Urbana-Champaign, Champaign, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-7443-6098","authenticated-orcid":false,"given":"Yueming","family":"Yuan","sequence":"additional","affiliation":[{"name":"University of Illinois at Urbana-Champaign, Champaign, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-1442-1502","authenticated-orcid":false,"given":"Devansh","family":"Jain","sequence":"additional","affiliation":[{"name":"University of Illinois at Urbana-Champaign, Champaign, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2774-8978","authenticated-orcid":false,"given":"Yuhao","family":"Ge","sequence":"additional","affiliation":[{"name":"University of Illinois at Urbana-Champaign, Champaign, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-5719-554X","authenticated-orcid":false,"given":"David","family":"Aponte","sequence":"additional","affiliation":[{"name":"Microsoft, Seattle, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2051-7616","authenticated-orcid":false,"given":"Yanqi","family":"Zhou","sequence":"additional","affiliation":[{"name":"Google DeepMind, Mountain View, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8140-2321","authenticated-orcid":false,"given":"Charith","family":"Mendis","sequence":"additional","affiliation":[{"name":"University of Illinois at Urbana-Champaign, Champaign, USA"}]}],"member":"320","published-online":{"date-parts":[[2025,4,9]]},"reference":[{"key":"e_1_2_2_1_1","unstructured":"[n. d.]. cuBLAS \u2014 developer.nvidia.com. https:\/\/developer.nvidia.com\/cublas"},{"key":"e_1_2_2_2_1","unstructured":"[n. d.]. cuSPARSE \u2014 developer.nvidia.com. https:\/\/developer.nvidia.com\/cusparse"},{"key":"e_1_2_2_3_1","unstructured":"[n. d.]. SuiteSparse. https:\/\/sparse.tamu.edu\/"},{"key":"e_1_2_2_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/3620665.3640366"},{"key":"e_1_2_2_5_1","unstructured":"Anthropic. 2023. Introducing 100K Context Windows. https:\/\/www.anthropic.com\/index\/100k-context-windows"},{"key":"e_1_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/3314221.3314615"},{"key":"e_1_2_2_7_1","volume-title":"Emanuele Del Sozzo, Abdurrahman Akkas, Yunming Zhang, Patricia Suriana, Shoaib Kamil, and Saman P. Amarasinghe.","author":"Baghdadi Riyadh","year":"2018","unstructured":"Riyadh Baghdadi, Jessica Ray, Malek Ben Romdhane, Emanuele Del Sozzo, Abdurrahman Akkas, Yunming Zhang, Patricia Suriana, Shoaib Kamil, and Saman P. Amarasinghe. 2018. Tiramisu: A Polyhedral Compiler for Expressing Fast and Portable Code. CoRR, abs\/1804.10694 (2018), arXiv:1804.10694. arxiv:1804.10694"},{"key":"e_1_2_2_8_1","volume-title":"Longformer: The Long-Document Transformer. arxiv:2004.05150.","author":"Beltagy Iz","year":"2020","unstructured":"Iz Beltagy, Matthew E. Peters, and Arman Cohan. 2020. Longformer: The Long-Document Transformer. arxiv:2004.05150."},{"key":"e_1_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/1375581.1375595"},{"key":"e_1_2_2_10_1","volume-title":"Chris Leary, Dougal Maclaurin, George Necula, Adam Paszke, Jake VanderPlas, Skye Wanderman-Milne, and Qiao Zhang.","author":"Bradbury James","year":"2018","unstructured":"James Bradbury, Roy Frostig, Peter Hawkins, Matthew James Johnson, Chris Leary, Dougal Maclaurin, George Necula, Adam Paszke, Jake VanderPlas, Skye Wanderman-Milne, and Qiao Zhang. 2018. JAX: composable transformations of Python+NumPy programs. http:\/\/github.com\/google\/jax"},{"key":"e_1_2_2_11_1","volume-title":"Language Models are Few-Shot Learners. CoRR, abs\/2005.14165","author":"Brown Tom B.","year":"2020","unstructured":"Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. CoRR, abs\/2005.14165 (2020), arXiv:2005.14165. arxiv:2005.14165"},{"key":"e_1_2_2_12_1","volume-title":"Proceedings of the 13th USENIX Conference on Operating Systems Design and Implementation (OSDI\u201918)","author":"Chen Tianqi","year":"2018","unstructured":"Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Meghan Cowan, Haichen Shen, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning. In Proceedings of the 13th USENIX Conference on Operating Systems Design and Implementation (OSDI\u201918). USENIX Association, USA. 579\u2013594. isbn:9781931971478"},{"key":"e_1_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/3126908.3126936"},{"key":"e_1_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2018.00065"},{"key":"e_1_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/3581784.3607097"},{"key":"e_1_2_2_16_1","volume-title":"Generating Long Sequences with Sparse Transformers. CoRR, abs\/1904.10509","author":"Child Rewon","year":"2019","unstructured":"Rewon Child, Scott Gray, Alec Radford, and Ilya Sutskever. 2019. Generating Long Sequences with Sparse Transformers. CoRR, abs\/1904.10509 (2019), arXiv:1904.10509. arxiv:1904.10509"},{"key":"e_1_2_2_17_1","volume-title":"Amarasinghe","author":"Chou Stephen","year":"2018","unstructured":"Stephen Chou, Fredrik Kjolstad, and Saman P. Amarasinghe. 2018. Unified Sparse Formats for Tensor Algebra Compilers. CoRR, abs\/1804.10112 (2018), arXiv:1804.10112. arxiv:1804.10112"},{"key":"e_1_2_2_18_1","unstructured":"Guohao Dai Guyue Huang Shang Yang Zhongming Yu Hengrui Zhang Yufei Ding Yuan Xie Huazhong Yang and Yu Wang. 2022. Heuristic Adaptability to Input Dynamics for SpMM on GPUs. arxiv:2202.08556. arxiv:2202.08556"},{"key":"e_1_2_2_19_1","volume-title":"Pixelated Butterfly: Simple and Efficient Sparse training for Neural Network Models. arxiv:2112.00029. arxiv:2112.00029","author":"Dao Tri","year":"2022","unstructured":"Tri Dao, Beidi Chen, Kaizhao Liang, Jiaming Yang, Zhao Song, Atri Rudra, and Christopher R\u00e9. 2022. Pixelated Butterfly: Simple and Efficient Sparse training for Neural Network Models. arxiv:2112.00029. arxiv:2112.00029"},{"key":"e_1_2_2_20_1","volume-title":"BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. CoRR, abs\/1810.04805","author":"Devlin Jacob","year":"2018","unstructured":"Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. CoRR, abs\/1810.04805 (2018), arXiv:1810.04805. arxiv:1810.04805"},{"key":"e_1_2_2_21_1","unstructured":"Jack Dongarra. 1995. Block Compressed Row Storage (BCRS) \u2014 netlib.org. https:\/\/netlib.org\/linalg\/html_templates\/node93.html [Accessed 15-10-2024]"},{"key":"e_1_2_2_22_1","unstructured":"Jack Dongarra. 1995. Compressed Diagonal Storage (CDS) \u2014 netlib.org. https:\/\/netlib.org\/linalg\/html_templates\/node94.html [Accessed 15-10-2024]"},{"key":"e_1_2_2_23_1","unstructured":"Jack Dongarra. 1995. Compressed Row Storage (CRS) \u2014 netlib.org. https:\/\/netlib.org\/linalg\/html_templates\/node91.html [Accessed 15-10-2024]"},{"key":"e_1_2_2_24_1","unstructured":"Clement Farabet and Tris Warkentin. [n. d.]. Gemma 2 is now available to researchers and developers \u2014 blog.google. https:\/\/blog.google\/technology\/developers\/google-gemma-2\/ [Accessed 15-07-2024]"},{"key":"e_1_2_2_25_1","volume-title":"Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture. arxiv:2310.12109. arxiv:2310.12109","author":"Fu Daniel Y.","year":"2023","unstructured":"Daniel Y. Fu, Simran Arora, Jessica Grogan, Isys Johnson, Sabri Eyuboglu, Armin W. Thomas, Benjamin Spector, Michael Poli, Atri Rudra, and Christopher R\u00e9. 2023. Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture. arxiv:2310.12109. arxiv:2310.12109"},{"key":"e_1_2_2_26_1","doi-asserted-by":"crossref","unstructured":"Trevor Gale Matei Zaharia Cliff Young and Erich Elsen. 2020. Sparse GPU Kernels for Deep Learning. arxiv:2006.10901.","DOI":"10.1109\/SC41405.2020.00021"},{"key":"e_1_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.5555\/3433701.3433723"},{"key":"e_1_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/3571157"},{"key":"e_1_2_2_29_1","unstructured":"Scott Gray Alec Radford and Durk Kingma. 2017. Block-Sparse GPU Kernels. https:\/\/openai.com\/research\/block-sparse-gpu-kernels"},{"key":"e_1_2_2_30_1","volume-title":"Armin Gr\u00f6\u00df linger, and Christian Lengauer","author":"Grosser Tobias","year":"2012","unstructured":"Tobias Grosser, Armin Gr\u00f6\u00df linger, and Christian Lengauer. 2012. Polly - Performing Polyhedral Optimizations on a Low-Level Intermediate Representation. Parallel Process. Lett., 22 (2012), https:\/\/api.semanticscholar.org\/CorpusID:18533155"},{"key":"e_1_2_2_31_1","unstructured":"Chi Han Qifan Wang Hao Peng Wenhan Xiong Yu Chen Heng Ji and Sinong Wang. 2024. LM-Infinite: Zero-Shot Extreme Length Generalization for Large Language Models. arxiv:2308.16137. arxiv:2308.16137"},{"key":"e_1_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/3293883.3295712"},{"key":"e_1_2_2_33_1","unstructured":"Victor Eijkhout Jack Dongarra and Henk van der Vorst. [n. d.]. SparseBench. https:\/\/www.netlib.org\/benchmark\/sparsebench\/"},{"key":"e_1_2_2_34_1","unstructured":"Albert Q. Jiang Alexandre Sablayrolles Arthur Mensch Chris Bamford Devendra Singh Chaplot Diego de las Casas Florian Bressand Gianna Lengyel Guillaume Lample Lucile Saulnier L\u00e9lio Renard Lavaud Marie-Anne Lachaux Pierre Stock Teven Le Scao Thibaut Lavril Thomas Wang Timoth\u00e9e Lacroix and William El Sayed. 2023. Mistral 7B. arxiv:2310.06825. arxiv:2310.06825"},{"key":"e_1_2_2_35_1","volume-title":"Reformer: The Efficient Transformer. CoRR, abs\/2001.04451","author":"Kitaev Nikita","year":"2020","unstructured":"Nikita Kitaev, Lukasz Kaiser, and Anselm Levskaya. 2020. Reformer: The Efficient Transformer. CoRR, abs\/2001.04451 (2020), arXiv:2001.04451. arxiv:2001.04451"},{"key":"e_1_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/3133901"},{"key":"e_1_2_2_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/2775054.2694364"},{"key":"e_1_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/HiPC.2018.00013"},{"key":"e_1_2_2_39_1","unstructured":"NVIDIA. 2020. Ampere Architecture. https:\/\/www.nvidia.com\/en-us\/data-center\/ampere-architecture\/"},{"key":"e_1_2_2_40_1","unstructured":"OpenAI. 2023. GPT-4 Technical Report. arxiv:2303.08774."},{"key":"e_1_2_2_41_1","unstructured":"Jeff Pool Abhishek Sawarkar and Jay Rodge. 2023. Accelerating inference with sparsity using the Nvidia ampere architecture and NVIDIA TENSORRT. https:\/\/developer.nvidia.com\/blog\/accelerating-inference-with-sparsity-using-ampere-and-tensorrt\/"},{"key":"e_1_2_2_42_1","volume-title":"Sinong Wang, and Jie Tang.","author":"Qiu Jiezhong","year":"2020","unstructured":"Jiezhong Qiu, Hao Ma, Omer Levy, Scott Wen tau Yih, Sinong Wang, and Jie Tang. 2020. Blockwise Self-Attention for Long Document Understanding. arxiv:1911.02972. arxiv:1911.02972"},{"key":"e_1_2_2_43_1","unstructured":"Alec Radford Jeff Wu Rewon Child David Luan Dario Amodei and Ilya Sutskever. 2019. Language Models are Unsupervised Multitask Learners. https:\/\/api.semanticscholar.org\/CorpusID:160025533"},{"key":"e_1_2_2_44_1","unstructured":"Aditya Ramesh Mikhail Pavlov Gabriel Goh Scott Gray Chelsea Voss Alec Radford Mark Chen and Ilya Sutskever. 2021. Zero-Shot Text-to-Image Generation. arxiv:2102.12092. arxiv:2102.12092"},{"key":"e_1_2_2_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/2858788.2688515"},{"key":"e_1_2_2_46_1","doi-asserted-by":"crossref","unstructured":"Yousef Saad. 2003. Iterative methods for sparse linear systems.. SIAM. isbn:978-0-89871-534-7","DOI":"10.1137\/1.9780898718003"},{"key":"e_1_2_2_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/3363785"},{"key":"e_1_2_2_48_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.emnlp-main.823"},{"key":"e_1_2_2_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/2833179.2833183"},{"key":"e_1_2_2_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/2838734"},{"key":"e_1_2_2_51_1","unstructured":"Yi Tay Mostafa Dehghani Samira Abnar Yikang Shen Dara Bahri Philip Pham Jinfeng Rao Liu Yang Sebastian Ruder and Donald Metzler. 2020. Long Range Arena: A Benchmark for Efficient Transformers. arxiv:2011.04006. arxiv:2011.04006"},{"key":"e_1_2_2_52_1","volume-title":"Efficient Transformers: A Survey. CoRR, abs\/2009.06732","author":"Tay Yi","year":"2020","unstructured":"Yi Tay, Mostafa Dehghani, Dara Bahri, and Donald Metzler. 2020. Efficient Transformers: A Survey. CoRR, abs\/2009.06732 (2020), arXiv:2009.06732. arxiv:2009.06732"},{"key":"e_1_2_2_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/3315508.3329973"},{"key":"e_1_2_2_54_1","volume-title":"Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions. CoRR, abs\/1802.04730","author":"Vasilache Nicolas","year":"2018","unstructured":"Nicolas Vasilache, Oleksandr Zinenko, Theodoros Theodoridis, Priya Goyal, Zachary DeVito, William S. Moses, Sven Verdoolaege, Andrew Adams, and Albert Cohen. 2018. Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions. CoRR, abs\/1802.04730 (2018), arXiv:1802.04730. arxiv:1802.04730"},{"key":"e_1_2_2_55_1","volume-title":"\u0141 ukasz Kaiser, and Illia Polosukhin","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, \u0141 ukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems, I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). 30, Curran Associates, Inc.. https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2017\/file\/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf"},{"key":"e_1_2_2_56_1","doi-asserted-by":"publisher","DOI":"10.1145\/3591302"},{"key":"e_1_2_2_57_1","doi-asserted-by":"publisher","DOI":"10.1145\/3582016.3582047"},{"key":"e_1_2_2_58_1","volume-title":"Deepti Raj G, Rutvij H Jhaveri, Prabadevi B, Weizheng Wang, Athanasios V. Vasilakos, and Thippa Reddy Gadekallu.","author":"Yenduri Gokul","year":"2023","unstructured":"Gokul Yenduri, Ramalingam M, Chemmalar Selvi G, Supriya Y, Gautam Srivastava, Praveen Kumar Reddy Maddikunta, Deepti Raj G, Rutvij H Jhaveri, Prabadevi B, Weizheng Wang, Athanasios V. Vasilakos, and Thippa Reddy Gadekallu. 2023. Generative Pre-trained Transformer: A Comprehensive Review on Enabling Technologies, Potential Applications, Emerging Challenges, and Future Directions. arxiv:2305.10435. arxiv:2305.10435"},{"key":"e_1_2_2_59_1","volume-title":"Big Bird: Transformers for Longer Sequences. CoRR, abs\/2007.14062","author":"Zaheer Manzil","year":"2020","unstructured":"Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Onta\u00f1\u00f3n, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, and Amr Ahmed. 2020. Big Bird: Transformers for Longer Sequences. CoRR, abs\/2007.14062 (2020), arXiv:2007.14062. arxiv:2007.14062"},{"key":"e_1_2_2_60_1","volume-title":"16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22)","author":"Zheng Ningxin","year":"2022","unstructured":"Ningxin Zheng, Bin Lin, Quanlu Zhang, Lingxiao Ma, Yuqing Yang, Fan Yang, Yang Wang, Mao Yang, and Lidong Zhou. 2022. SparTA: Deep-Learning Model Sparsity via Tensor-with-Sparsity-Attribute. In 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22). USENIX Association, Carlsbad, CA. 213\u2013232. isbn:978-1-939133-28-1 https:\/\/www.usenix.org\/conference\/osdi22\/presentation\/zheng-ningxin"},{"key":"e_1_2_2_61_1","volume-title":"Long-Short Transformer: Efficient Transformers for Language and Vision. CoRR, abs\/2107.02192","author":"Zhu Chen","year":"2021","unstructured":"Chen Zhu, Wei Ping, Chaowei Xiao, Mohammad Shoeybi, Tom Goldstein, Anima Anandkumar, and Bryan Catanzaro. 2021. Long-Short Transformer: Efficient Transformers for Language and Vision. CoRR, abs\/2107.02192 (2021), arXiv:2107.02192. arxiv:2107.02192"}],"container-title":["Proceedings of the ACM on Programming Languages"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3720503","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3720503","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T17:13:53Z","timestamp":1760030033000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3720503"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,4,9]]},"references-count":61,"journal-issue":{"issue":"OOPSLA1","published-print":{"date-parts":[[2025,4,9]]}},"alternative-id":["10.1145\/3720503"],"URL":"https:\/\/doi.org\/10.1145\/3720503","relation":{},"ISSN":["2475-1421"],"issn-type":[{"type":"electronic","value":"2475-1421"}],"subject":[],"published":{"date-parts":[[2025,4,9]]},"assertion":[{"value":"2024-10-15","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-02-18","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-04-09","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}