{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,23]],"date-time":"2026-04-23T07:59:55Z","timestamp":1776931195088,"version":"3.51.2"},"publisher-location":"New York, NY, USA","reference-count":45,"publisher":"ACM","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2025,12,9]]},"DOI":"10.1145\/3743093.3770926","type":"proceedings-article","created":{"date-parts":[[2025,12,6]],"date-time":"2025-12-06T08:08:11Z","timestamp":1765008491000},"page":"1-5","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["ReSSFormer: A Recursive Sparse Structured Transformer for Scalable and Long-Context Reasoning"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0009-0008-9178-2912","authenticated-orcid":false,"given":"Haochen","family":"You","sequence":"first","affiliation":[{"name":"Graduate School of Arts and Sciences, Columbia University, New York City, New York, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-1444-7267","authenticated-orcid":false,"given":"Baojing","family":"Liu","sequence":"additional","affiliation":[{"name":"School of Artificial Intelligence, Hebei Institute of Communications, Shijiazhuang, Hebei Province, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,12,6]]},"reference":[{"key":"e_1_3_3_1_2_2","doi-asserted-by":"crossref","unstructured":"Joshua Ainslie Tao Lei Michiel de Jong Santiago Onta\u00f1\u00f3n Siddhartha Brahma Yury Zemlyanskiy David Uthus Mandy Guo James Lee-Thorp Yi Tay et\u00a0al. 2023. Colt5: Faster long-range transformers with conditional computation. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2303.09752 (2023).","DOI":"10.18653\/v1\/2023.emnlp-main.309"},{"key":"e_1_3_3_1_3_2","unstructured":"Iz Beltagy Matthew\u00a0E Peters and Arman Cohan. 2020. Longformer: The long-document transformer. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2004.05150 (2020)."},{"key":"e_1_3_3_1_4_2","unstructured":"Wenhu Chen Hongmin Wang Jianshu Chen Yunkai Zhang Hong Wang Shiyang Li Xiyou Zhou and William\u00a0Yang Wang. 2019. Tabfact: A large-scale dataset for table-based fact verification. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/1909.02164 (2019)."},{"key":"e_1_3_3_1_5_2","unstructured":"Krzysztof Choromanski Valerii Likhosherstov David Dohan Xingyou Song Andreea Gane Tamas Sarlos Peter Hawkins Jared Davis Afroz Mohiuddin Lukasz Kaiser et\u00a0al. 2020. Rethinking attention with performers. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2009.14794 (2020)."},{"key":"e_1_3_3_1_6_2","unstructured":"Karl Cobbe Vineet Kosaraju Mohammad Bavarian Mark Chen Heewoo Jun Lukasz Kaiser Matthias Plappert Jerry Tworek Jacob Hilton Reiichiro Nakano et\u00a0al. 2021. Training verifiers to solve math word problems. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2110.14168 (2021)."},{"key":"e_1_3_3_1_7_2","unstructured":"William Fedus Barret Zoph and Noam Shazeer. 2022. Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity. Journal of Machine Learning Research 23 120 (2022) 1\u201339."},{"key":"e_1_3_3_1_8_2","unstructured":"Leo Gao Stella Biderman Sid Black Laurence Golding Travis Hoppe Charles Foster Jason Phang Horace He Anish Thite Noa Nabeshima et\u00a0al. 2020. The pile: An 800gb dataset of diverse text for language modeling. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2101.00027 (2020)."},{"key":"e_1_3_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP48485.2024.10446483"},{"key":"e_1_3_3_1_10_2","unstructured":"Jordan Hoffmann Sebastian Borgeaud Arthur Mensch Elena Buchatskaya Trevor Cai Eliza Rutherford Diego de\u00a0Las Casas Lisa\u00a0Anne Hendricks Johannes Welbl Aidan Clark et\u00a0al. 2022. Training compute-optimal large language models. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2203.15556 (2022)."},{"key":"e_1_3_3_1_11_2","unstructured":"Weihua Hu Matthias Fey Marinka Zitnik Yuxiao Dong Hongyu Ren Bowen Liu Michele Catasta and Jure Leskovec. 2020. Open graph benchmark: Datasets for machine learning on graphs. Advances in neural information processing systems 33 (2020) 22118\u201322133."},{"key":"e_1_3_3_1_12_2","unstructured":"Xin Huang Ashish Khetan Milan Cvitkovic and Zohar Karnin. 2020. Tabtransformer: Tabular data modeling using contextual embeddings. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2012.06678 (2020)."},{"key":"e_1_3_3_1_13_2","unstructured":"Yinan Huang William Lu Joshua Robinson Yu Yang Muhan Zhang Stefanie Jegelka and Pan Li. 2023. On the stability of expressive positional encodings for graphs. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2310.02579 (2023)."},{"key":"e_1_3_3_1_14_2","unstructured":"Jared Kaplan Sam McCandlish Tom Henighan Tom\u00a0B Brown Benjamin Chess Rewon Child Scott Gray Alec Radford Jeffrey Wu and Dario Amodei. 2020. Scaling laws for neural language models. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2001.08361 (2020)."},{"key":"e_1_3_3_1_15_2","unstructured":"Nikita Kitaev \u0141ukasz Kaiser and Anselm Levskaya. 2020. Reformer: The efficient transformer. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2001.04451 (2020)."},{"key":"e_1_3_3_1_16_2","doi-asserted-by":"crossref","unstructured":"Tom\u00e1\u0161 Ko\u010disk\u1ef3 Jonathan Schwarz Phil Blunsom Chris Dyer Karl\u00a0Moritz Hermann G\u00e1bor Melis and Edward Grefenstette. 2018. The narrativeqa reading comprehension challenge. Transactions of the Association for Computational Linguistics 6 (2018) 317\u2013328.","DOI":"10.1162\/tacl_a_00023"},{"key":"e_1_3_3_1_17_2","unstructured":"Yuqi Li Kai Li Xin Yin Zhifei Yang Junhao Dong Zeyu Dong Chuanguang Yang Yingli Tian and Yao Lu. 2025. Sepprune: Structured pruning for efficient deep speech separation. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2505.12079 (2025)."},{"key":"e_1_3_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1145\/3731715.3733294"},{"key":"e_1_3_3_1_19_2","unstructured":"Yuqi Li Yao Lu Zeyu Dong Chuanguang Yang Yihao Chen and Jianping Gou. 2024. Sglp: A similarity guided fast layer partition pruning for compressing large deep models. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2410.14720 (2024)."},{"key":"e_1_3_3_1_20_2","unstructured":"Yuqi Li Chuanguang Yang Hansheng Zeng Zeyu Dong Zhulin An Yongjun Xu Yingli Tian and Hao Wu. 2025. Frequency-aligned knowledge distillation for lightweight spatiotemporal forecasting. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2507.02939 (2025)."},{"key":"e_1_3_3_1_21_2","unstructured":"Chaofan Lin Jiaming Tang Shuo Yang Hanshuo Wang Tian Tang Boyu Tian Ion Stoica Song Han and Mingyu Gao. 2025. Twilight: Adaptive Attention Sparsity with Hierarchical Top-p Pruning. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2502.02770 (2025)."},{"key":"e_1_3_3_1_22_2","unstructured":"Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/1711.05101 (2017)."},{"key":"e_1_3_3_1_23_2","unstructured":"Chao Lou Zixia Jia Zilong Zheng and Kewei Tu. 2024. Sparser is faster and less is more: Efficient sparse attention for long-range transformers. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2406.16747 (2024)."},{"key":"e_1_3_3_1_24_2","doi-asserted-by":"crossref","unstructured":"Yuankai Luo Hongkang Li Lei Shi and Xiao-Ming Wu. 2024. Enhancing graph transformers with hierarchical distance structural encoding. Advances in Neural Information Processing Systems 37 (2024) 57150\u201357182.","DOI":"10.52202\/079017-1821"},{"key":"e_1_3_3_1_25_2","unstructured":"Mattia Opper Roland Fernandez Paul Smolensky and Jianfeng Gao. 2025. TRA: Better Length Generalisation with Threshold Relative Attention. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2503.23174 (2025)."},{"key":"e_1_3_3_1_26_2","unstructured":"Xinghan Pan. 2025. Enhancing RWKV-based Language Models for Long-Sequence Text Generation. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2502.15485 (2025)."},{"key":"e_1_3_3_1_27_2","unstructured":"Piotr Pi\u0119kos R\u00f3bert Csord\u00e1s and J\u00fcrgen Schmidhuber. 2025. Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2505.00315 (2025)."},{"key":"e_1_3_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1145\/3696410.3714805"},{"key":"e_1_3_3_1_29_2","first-page":"8748","volume-title":"International conference on machine learning","author":"Radford Alec","year":"2021","unstructured":"Alec Radford, Jong\u00a0Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et\u00a0al. 2021. Learning transferable visual models from natural language supervision. In International conference on machine learning. PmLR, 8748\u20138763."},{"key":"e_1_3_3_1_30_2","unstructured":"Alec Radford Karthik Narasimhan Tim Salimans Ilya Sutskever et\u00a0al. 2018. Improving language understanding by generative pre-training. (2018)."},{"key":"e_1_3_3_1_31_2","unstructured":"Alec Radford Jeffrey Wu Rewon Child David Luan Dario Amodei Ilya Sutskever et\u00a0al. 2019. Language models are unsupervised multitask learners. OpenAI blog 1 8 (2019) 9."},{"key":"e_1_3_3_1_32_2","unstructured":"Jack\u00a0W Rae Anna Potapenko Siddhant\u00a0M Jayakumar and Timothy\u00a0P Lillicrap. 2019. Compressive transformers for long-range sequence modelling. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/1911.05507 (2019)."},{"key":"e_1_3_3_1_33_2","doi-asserted-by":"crossref","unstructured":"Peter Shaw Jakob Uszkoreit and Ashish Vaswani. 2018. Self-attention with relative position representations. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/1803.02155 (2018).","DOI":"10.18653\/v1\/N18-2074"},{"key":"e_1_3_3_1_34_2","doi-asserted-by":"crossref","unstructured":"Jianlin Su Murtadha Ahmed Yu Lu Shengfeng Pan Wen Bo and Yunfeng Liu. 2024. Roformer: Enhanced transformer with rotary position embedding. Neurocomputing 568 (2024) 127063.","DOI":"10.1016\/j.neucom.2023.127063"},{"key":"e_1_3_3_1_35_2","unstructured":"Haoyu Wang Peihao Wang Mufei Li Shikun Liu Siqi Miao Zhangyang Wang and Pan Li. 2025. Graph-KV: Breaking Sequence via Injecting Structural Biases into Large Language Models. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2506.07334 (2025)."},{"key":"e_1_3_3_1_36_2","unstructured":"Xinyi Wu Yifei Wang Stefanie Jegelka and Ali Jadbabaie. 2025. On the emergence of position bias in transformers. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2502.01951 (2025)."},{"key":"e_1_3_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33017305"},{"key":"e_1_3_3_1_38_2","doi-asserted-by":"crossref","unstructured":"Peng Xu Xiatian Zhu and David\u00a0A Clifton. 2023. Multimodal learning with transformers: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence 45 10 (2023) 12113\u201312132.","DOI":"10.1109\/TPAMI.2023.3275156"},{"key":"e_1_3_3_1_39_2","doi-asserted-by":"crossref","unstructured":"Zhilin Yang Peng Qi Saizheng Zhang Yoshua Bengio William\u00a0W Cohen Ruslan Salakhutdinov and Christopher\u00a0D Manning. 2018. HotpotQA: A dataset for diverse explainable multi-hop question answering. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/1809.09600 (2018).","DOI":"10.18653\/v1\/D18-1259"},{"key":"e_1_3_3_1_40_2","unstructured":"Chengxuan Ying Tianle Cai Shengjie Luo Shuxin Zheng Guolin Ke Di He Yanming Shen and Tie-Yan Liu. 2021. Do transformers really perform badly for graph representation?Advances in neural information processing systems 34 (2021) 28877\u201328888."},{"key":"e_1_3_3_1_41_2","first-page":"59","volume-title":"International Conference on Neural Information Processing","author":"You Haochen","year":"2024","unstructured":"Haochen You and Baojing Liu. 2024. Application of pseudometric functions in clustering and a novel similarity measure based on path information discrepancy. In International Conference on Neural Information Processing. Springer, 59\u201373."},{"key":"e_1_3_3_1_42_2","unstructured":"Haochen You and Baojing Liu. 2025. Mover: Multimodal optimal transport with volume-based embedding regularization. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2508.12149 (2025)."},{"key":"e_1_3_3_1_43_2","unstructured":"Haochen You Baojing Liu and Hongyang He. 2025. Modular MeanFlow: Towards Stable and Scalable One-Step Generative Modeling. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2508.17426 (2025)."},{"key":"e_1_3_3_1_44_2","unstructured":"Manzil Zaheer Guru Guruganesh Kumar\u00a0Avinava Dubey Joshua Ainslie Chris Alberti Santiago Ontanon Philip Pham Anirudh Ravula Qifan Wang Li Yang et\u00a0al. 2020. Big bird: Transformers for longer sequences. Advances in neural information processing systems 33 (2020) 17283\u201317297."},{"key":"e_1_3_3_1_45_2","doi-asserted-by":"crossref","unstructured":"Chaoran Zhang Lixin Zou Dan Luo Min Tang Xiangyang Luo Zihao Li and Chenliang Li. 2024. Efficient sparse attention needs adaptive token release. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2407.02328 (2024).","DOI":"10.18653\/v1\/2024.findings-acl.837"},{"key":"e_1_3_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00285"}],"event":{"name":"MMAsia '25: ACM Multimedia Asia","location":"Kuala Lumpur Malaysia","acronym":"MMAsia '25","sponsor":["SIGMM ACM Special Interest Group on Multimedia"]},"container-title":["Proceedings of the 7th ACM International Conference on Multimedia in Asia"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3743093.3770926","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,12,6]],"date-time":"2025-12-06T08:09:16Z","timestamp":1765008556000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3743093.3770926"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,12,6]]},"references-count":45,"alternative-id":["10.1145\/3743093.3770926","10.1145\/3743093"],"URL":"https:\/\/doi.org\/10.1145\/3743093.3770926","relation":{},"subject":[],"published":{"date-parts":[[2025,12,6]]},"assertion":[{"value":"2025-12-06","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}