{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,19]],"date-time":"2026-05-19T07:14:53Z","timestamp":1779174893399,"version":"3.51.4"},"reference-count":118,"publisher":"Association for Computing Machinery (ACM)","issue":"3","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Manag. Data"],"published-print":{"date-parts":[[2025,6,17]]},"abstract":"<jats:p>\n                    As the field of Large Language Models (LLMs) continues to evolve, the context length in inference is steadily growing. Key-Value Cache (KVCache), the intermediate representations of tokens within LLM inference, has now become the primary memory bottleneck due to limited GPU memory. Current methods selectively determine suitable keys and values for self-attention computation in LLMs to address the issue. However, they either fall short in maintaining model quality or result in high serving latency. Drawing inspiration from advanced embedding retrieval techniques prevalent in the data management community, we consider the storage and retrieval of KVCache as a typical embedding retrieval problem. We propose\n                    <jats:bold>PQCache<\/jats:bold>\n                    , which employs Product Quantization (PQ) to manage KVCache, maintaining model quality while ensuring low serving latency. During the prefilling phase, we apply PQ to tokens' keys for each LLM layer and head. During the autoregressive decoding phase, we use PQ codes and centroids to approximately identify important preceding tokens, then fetch the corresponding key-value pairs for self-attention computation. Through meticulous design of overlapping and caching, we minimize any additional computation and communication overhead during both phases. Extensive experiments demonstrate that PQCache achieves both effectiveness and efficiency, with 4.60% score improvement over existing methods on InfiniteBench and low system latency in both prefilling and decoding.\n                  <\/jats:p>","DOI":"10.1145\/3725338","type":"journal-article","created":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T21:23:29Z","timestamp":1750281809000},"page":"1-30","source":"Crossref","is-referenced-by-count":11,"title":["PQCache: Product Quantization-based KVCache for Long Context LLM Inference"],"prefix":"10.1145","volume":"3","author":[{"ORCID":"https:\/\/orcid.org\/0009-0000-4188-7742","authenticated-orcid":false,"given":"Hailin","family":"Zhang","sequence":"first","affiliation":[{"name":"Peking University, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-7388-7749","authenticated-orcid":false,"given":"Xiaodong","family":"Ji","sequence":"additional","affiliation":[{"name":"Peking University, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-7251-3475","authenticated-orcid":false,"given":"Yilin","family":"Chen","sequence":"additional","affiliation":[{"name":"Peking University, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1658-0380","authenticated-orcid":false,"given":"Fangcheng","family":"Fu","sequence":"additional","affiliation":[{"name":"Peking University, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9371-8358","authenticated-orcid":false,"given":"Xupeng","family":"Miao","sequence":"additional","affiliation":[{"name":"Purdue University, West Lafayette, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6766-757X","authenticated-orcid":false,"given":"Xiaonan","family":"Nie","sequence":"additional","affiliation":[{"name":"Peking University, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-5124-0241","authenticated-orcid":false,"given":"Weipeng","family":"Chen","sequence":"additional","affiliation":[{"name":"Baichuan Inc., Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1681-4677","authenticated-orcid":false,"given":"Bin","family":"Cui","sequence":"additional","affiliation":[{"name":"Peking University, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,6,18]]},"reference":[{"key":"e_1_2_2_1_1","volume-title":"TensorFlow: A System for Large-Scale Machine Learning. In 12th USENIX Symposium on Operating Systems Design and Implementation, OSDI.","author":"Abadi Mart\u00edn","year":"2016","unstructured":"Mart\u00edn Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek Gordon Murray, Benoit Steiner, Paul A. Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A System for Large-Scale Machine Learning. In 12th USENIX Symposium on Operating Systems Design and Implementation, OSDI."},{"key":"e_1_2_2_2_1","volume-title":"Proceedings of the Seventh Annual Conference on Machine Learning and Systems, MLSys.","author":"Adnan Muhammad","year":"2024","unstructured":"Muhammad Adnan, Akhil Arunkumar, Gaurav Jain, Prashant J. Nair, Ilya Soloveychik, and Purushotham Kamath. 2024. Keyformer: KV Cache reduction through key tokens selection for Efficient Generative Inference. In Proceedings of the Seventh Annual Conference on Machine Learning and Systems, MLSys."},{"key":"e_1_2_2_3_1","unstructured":"Moonshot AI. 2024. KimiChat. https:\/\/kimi.moonshot.cn\/"},{"key":"e_1_2_2_4_1","volume-title":"The Impact of Large Language Models on Scientific Discovery: a Preliminary Study using GPT-4. CoRR","author":"Science Microsoft Research","year":"2023","unstructured":"Microsoft Research AI4Science and Microsoft Azure Quantum. 2023. The Impact of Large Language Models on Scientific Discovery: a Preliminary Study using GPT-4. CoRR, Vol. abs\/2311.07361 (2023)."},{"key":"e_1_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.14778\/2856318.2856324"},{"key":"e_1_2_2_6_1","unstructured":"Anthropic. 2023. Long context prompting for Claude 2.1. https:\/\/www.anthropic.com\/news\/claude-2--1-prompting"},{"key":"e_1_2_2_7_1","first-page":"8097","article-title":"Vector Search on Billion-Scale Data Collections","volume":"2150","author":"Azizi Ilias","year":"2024","unstructured":"Ilias Azizi. 2024. Vector Search on Billion-Scale Data Collections. Proceedings of the VLDB Endowment. ISSN, Vol. 2150 (2024), 8097.","journal-title":"Proceedings of the VLDB Endowment. ISSN"},{"key":"e_1_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.14778\/3583140.3583166"},{"key":"e_1_2_2_9_1","volume-title":"Dynamic Memory Management for GPU-Based Training of Deep Neural Networks. In IEEE International Parallel and Distributed Processing Symposium, IPDPS.","author":"Anshuj Garg Shriram S. B","year":"2019","unstructured":"Shriram S. B, Anshuj Garg, and Purushottam Kulkarni. 2019. Dynamic Memory Management for GPU-Based Training of Deep Neural Networks. In IEEE International Parallel and Distributed Processing Symposium, IPDPS."},{"key":"e_1_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2024.acl-long.172"},{"key":"e_1_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01258-8_13"},{"key":"e_1_2_2_12_1","first-page":"533","volume-title":"Nature","volume":"619","author":"Bi Kaifeng","year":"2023","unstructured":"Kaifeng Bi, Lingxi Xie, Hengheng Zhang, Xin Chen, Xiaotao Gu, and Qi Tian. 2023. Accurate medium-range global weather forecasting with 3D neural networks. Nature, Vol. 619, 7970 (2023), 533-538."},{"key":"e_1_2_2_13_1","unstructured":"Rishi Bommasani Drew A. Hudson Ehsan Adeli Russ B. Altman Simran Arora Sydney von Arx Michael S. Bernstein Jeannette Bohg Antoine Bosselut Emma Brunskill Erik Brynjolfsson Shyamal Buch Dallas Card Rodrigo Castellon Niladri S. Chatterji Annie S. Chen Kathleen Creel Jared Quincy Davis Dorottya Demszky Chris Donahue Moussa Doumbouya Esin Durmus Stefano Ermon John Etchemendy Kawin Ethayarajh Li Fei-Fei Chelsea Finn Trevor Gale Lauren Gillespie Karan Goel Noah D. Goodman Shelby Grossman Neel Guha Tatsunori Hashimoto Peter Henderson John Hewitt Daniel E. Ho Jenny Hong Kyle Hsu Jing Huang Thomas Icard Saahil Jain Dan Jurafsky Pratyusha Kalluri Siddharth Karamcheti Geoff Keeling Fereshte Khani Omar Khattab Pang Wei Koh Mark S. Krass Ranjay Krishna Rohith Kuditipudi and et al. 2021. On the Opportunities and Risks of Foundation Models. CoRR Vol. abs\/2108.07258 (2021)."},{"key":"e_1_2_2_14_1","volume-title":"PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling. CoRR","author":"Cai Zefan","year":"2069","unstructured":"Zefan Cai, Yichi Zhang, Bofei Gao, Yuliang Liu, Tianyu Liu, Keming Lu, Wayne Xiong, Yue Dong, Baobao Chang, Junjie Hu, and Wen Xiao. 2024. PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling. CoRR, Vol. abs\/2406.02069 (2024)."},{"key":"e_1_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.7554\/eLife.82819"},{"key":"e_1_2_2_16_1","volume-title":"SPANN: Highly-efficient Billion-scale Approximate Nearest Neighborhood Search. In Advances in Neural Information Processing Systems, NeurIPS.","author":"Chen Qi","year":"2021","unstructured":"Qi Chen, Bing Zhao, Haidong Wang, Mingqin Li, Chuanjie Liu, Zengzhong Li, Mao Yang, and Jingdong Wang. 2021a. SPANN: Highly-efficient Billion-scale Approximate Nearest Neighborhood Search. In Advances in Neural Information Processing Systems, NeurIPS."},{"key":"e_1_2_2_17_1","volume-title":"Differentiable Product Quantization for End-to-End Embedding Compression. In International Conference on Machine Learning, ICML.","author":"Chen Ting","year":"2020","unstructured":"Ting Chen, Lala Li, and Yizhou Sun. 2020. Differentiable Product Quantization for End-to-End Embedding Compression. In International Conference on Machine Learning, ICML."},{"key":"e_1_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11432-023-4127-5"},{"key":"e_1_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/3465998.3466004"},{"key":"e_1_2_2_20_1","unstructured":"Alibaba Cloud. 2024. Tongyi Qianwen. https:\/\/tongyi.aliyun.com\/qianwen\/"},{"key":"e_1_2_2_21_1","volume-title":"12th International Conference on Learning Representations, ICLR.","author":"Dao Tri","year":"2024","unstructured":"Tri Dao. 2024. FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning. In 12th International Conference on Learning Representations, ICLR."},{"key":"e_1_2_2_22_1","doi-asserted-by":"crossref","unstructured":"Tri Dao Daniel Y. Fu Stefano Ermon Atri Rudra and Christopher R\u00e9. 2022. FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness. In Advances in Neural Information Processing Systems NeurIPS.","DOI":"10.52202\/068431-1189"},{"key":"e_1_2_2_23_1","volume-title":"Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference. In International Conference on Machine Learning, ICML.","author":"Dong Harry","year":"2024","unstructured":"Harry Dong, Xinyu Yang, Zhenyu Zhang, Zhangyang Wang, Yuejie Chi, and Beidi Chen. 2024b. Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference. In International Conference on Machine Learning, ICML."},{"key":"e_1_2_2_24_1","volume-title":"QAQ: Quality Adaptive Quantization for LLM KV Cache. CoRR","author":"Dong Shichen","year":"2024","unstructured":"Shichen Dong, Wen Cheng, Jiayu Qin, and Wei Wang. 2024a. QAQ: Quality Adaptive Quantization for LLM KV Cache. CoRR, Vol. abs\/2403.04643 (2024)."},{"key":"e_1_2_2_25_1","volume-title":"9th International Conference on Learning Representations, ICLR.","author":"Dosovitskiy Alexey","year":"2021","unstructured":"Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In 9th International Conference on Learning Representations, ICLR."},{"key":"e_1_2_2_26_1","unstructured":"Abhimanyu Dubey Abhinav Jauhri Abhinav Pandey Abhishek Kadian Ahmad Al-Dahle Aiesha Letman Akhil Mathur Alan Schelten Amy Yang Angela Fan Anirudh Goyal Anthony Hartshorn Aobo Yang Archi Mitra Archie Sravankumar Artem Korenev Arthur Hinsvark Arun Rao Aston Zhang Aur\u00e9lien Rodriguez Austen Gregerson Ava Spataru Baptiste Rozi\u00e8re Bethany Biron Binh Tang Bobbie Chern Charlotte Caucheteux Chaya Nayak Chloe Bi Chris Marra Chris McConnell Christian Keller Christophe Touret Chunyang Wu Corinne Wong Cristian Canton Ferrer Cyrus Nikolaidis Damien Allonsius Daniel Song Danielle Pintz Danny Livshits David Esiobu Dhruv Choudhary Dhruv Mahajan Diego Garcia-Olano Diego Perino Dieuwke Hupkes Egor Lakomkin Ehab AlBadawy Elina Lobanova Emily Dinan Eric Michael Smith Filip Radenovic Frank Zhang Gabriel Synnaeve Gabrielle Lee Georgia Lewis Anderson Graeme Nail Gr\u00e9goire Mialon Guan Pang Guillem Cucurell Hailey Nguyen Hannah Korevaar Hu Xu Hugo Touvron Iliyan Zarov Imanol Arrieta Ibarra Isabel M. Kloumann Ishan Misra Ivan Evtimov Jade Copet Jaewon Lee Jan Geffert Jana Vranes Jason Park Jay Mahadeokar Jeet Shah Jelmer van der Linde Jennifer Billock Jenny Hong Jenya Lee Jeremy Fu Jianfeng Chi Jianyu Huang Jiawen Liu Jie Wang Jiecao Yu Joanna Bitton Joe Spisak Jongsoo Park Joseph Rocca Joshua Johnstun Joshua Saxe Junteng Jia Kalyan Vasuden Alwala Kartikeya Upasani Kate Plawiak Ke Li Kenneth Heafield Kevin Stone and et al. 2024. The Llama 3 Herd of Models. CoRR Vol. abs\/2407.21783 (2024)."},{"key":"e_1_2_2_27_1","volume-title":"Accelerating Product Quantization Query Execution Runtime. In International Conference on Management of Data, SIGMOD.","author":"Edian Ikraduya","year":"2021","unstructured":"Ikraduya Edian. 2021. Accelerating Product Quantization Query Execution Runtime. In International Conference on Management of Data, SIGMOD."},{"key":"e_1_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.14778\/3303753.3303754"},{"key":"e_1_2_2_29_1","volume-title":"Chain-of-Thought Hub: A Continuous Effort to Measure Large Language Models' Reasoning Performance. CoRR","author":"Fu Yao","year":"2023","unstructured":"Yao Fu, Litu Ou, Mingyu Chen, Yuhao Wan, Hao Peng, and Tushar Khot. 2023. Chain-of-Thought Hub: A Continuous Effort to Measure Large Language Models' Reasoning Performance. CoRR, Vol. abs\/2305.17306 (2023)."},{"key":"e_1_2_2_30_1","volume-title":"International Conference on Machine Learning, ICML.","author":"Fu Yao","year":"2024","unstructured":"Yao Fu, Rameswar Panda, Xinyao Niu, Xiang Yue, Hannaneh Hajishirzi, Yoon Kim, and Hao Peng. 2024. Data Engineering for Scaling Language Models to 128K Context. In International Conference on Machine Learning, ICML."},{"key":"e_1_2_2_31_1","volume-title":"AttentionStore: Cost-effective Attention Reuse across Multi-turn Conversations in Large Language Model Serving. CoRR","author":"Gao Bin","year":"1970","unstructured":"Bin Gao, Zhuomin He, Puru Sharma, Qingxuan Kang, Djordje Jevdjic, Junbo Deng, Xingkun Yang, Zhou Yu, and Pengfei Zuo. 2024. AttentionStore: Cost-effective Attention Reuse across Multi-turn Conversations in Large Language Model Serving. CoRR, Vol. abs\/2403.19708 (2024)."},{"key":"e_1_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/3654970"},{"key":"e_1_2_2_33_1","volume-title":"12th International Conference on Learning Representations, ICLR.","author":"Ge Suyu","year":"2024","unstructured":"Suyu Ge, Yunan Zhang, Liyuan Liu, Minjia Zhang, Jiawei Han, and Jianfeng Gao. 2024. Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs. In 12th International Conference on Learning Representations, ICLR."},{"key":"e_1_2_2_34_1","volume-title":"Optimized Product Quantization for Approximate Nearest Neighbor Search. In IEEE Conference on Computer Vision and Pattern Recognition.","author":"Ge Tiezheng","year":"2013","unstructured":"Tiezheng Ge, Kaiming He, Qifa Ke, and Jian Sun. 2013. Optimized Product Quantization for Approximate Nearest Neighbor Search. In IEEE Conference on Computer Vision and Pattern Recognition."},{"key":"e_1_2_2_35_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11390-024-3872-3"},{"key":"e_1_2_2_36_1","volume-title":"Accelerate: Training and inference at scale made simple, efficient and adaptable. https:\/\/github.com\/huggingface\/accelerate.","author":"Gugger Sylvain","year":"2022","unstructured":"Sylvain Gugger, Lysandre Debut, Thomas Wolf, Philipp Schmid, Zachary Mueller, Sourab Mangrulkar, Marc Sun, and Benjamin Bossan. 2022. Accelerate: Training and inference at scale made simple, efficient and adaptable. https:\/\/github.com\/huggingface\/accelerate."},{"key":"e_1_2_2_37_1","volume-title":"LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models. CoRR","author":"Han Chi","year":"2023","unstructured":"Chi Han, Qifan Wang, Wenhan Xiong, Yu Chen, Heng Ji, and Sinong Wang. 2023. LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models. CoRR, Vol. abs\/2308.16137 (2023)."},{"key":"e_1_2_2_38_1","volume-title":"Kurt Keutzer, and Amir Gholami.","author":"Hooper Coleman","year":"2024","unstructured":"Coleman Hooper, Sehoon Kim, Hiva Mohammadzadeh, Michael W. Mahoney, Yakun Sophia Shao, Kurt Keutzer, and Amir Gholami. 2024. KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization. In Advances in Neural Information Processing Systems, NeurIPS."},{"key":"e_1_2_2_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/3695988"},{"key":"e_1_2_2_40_1","volume-title":"Characterization of Large Language Model Development in the Datacenter. In 21st USENIX Symposium on Networked Systems Design and Implementation, NSDI.","author":"Hu Qinghao","year":"2024","unstructured":"Qinghao Hu, Zhisheng Ye, Zerui Wang, Guoteng Wang, Meng Zhang, Qiaoling Chen, Peng Sun, Dahua Lin, Xiaolin Wang, Yingwei Luo, Yonggang Wen, and Tianwei Zhang. 2024. Characterization of Large Language Model Development in the Datacenter. In 21st USENIX Symposium on Networked Systems Design and Implementation, NSDI."},{"key":"e_1_2_2_41_1","doi-asserted-by":"publisher","DOI":"10.14778\/3380750.3380754"},{"key":"e_1_2_2_42_1","volume-title":"HyoukJoong Lee, Jiquan Ngiam, Quoc V. Le, Yonghui Wu, and Zhifeng Chen.","author":"Huang Yanping","year":"2019","unstructured":"Yanping Huang, Youlong Cheng, Ankur Bapna, Orhan Firat, Dehao Chen, Mia Xu Chen, HyoukJoong Lee, Jiquan Ngiam, Quoc V. Le, Yonghui Wu, and Zhifeng Chen. 2019. GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism. In Advances in Neural Information Processing Systems, NeurIPS."},{"key":"e_1_2_2_43_1","volume-title":"Ravishankar Krishnawamy, and Rohan Kadekodi.","author":"Subramanya Suhas Jayaram","year":"2019","unstructured":"Suhas Jayaram Subramanya, Fnu Devvrit, Harsha Vardhan Simhadri, Ravishankar Krishnawamy, and Rohan Kadekodi. 2019. Diskann: Fast accurate billion-point nearest neighbor search on a single node. In Advances in Neural Information Processing Systems, NeurIPS."},{"key":"e_1_2_2_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2010.57"},{"key":"e_1_2_2_45_1","unstructured":"Albert Q. Jiang Alexandre Sablayrolles Arthur Mensch Chris Bamford Devendra Singh Chaplot Diego de Las Casas Florian Bressand Gianna Lengyel Guillaume Lample Lucile Saulnier L\u00e9lio Renard Lavaud Marie-Anne Lachaux Pierre Stock Teven Le Scao Thibaut Lavril Thomas Wang Timoth\u00e9e Lacroix and William El Sayed. 2023a. Mistral 7B. CoRR Vol. abs\/2310.06825 (2023)."},{"key":"e_1_2_2_46_1","volume-title":"Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention. CoRR","author":"Jiang Huiqiang","year":"2024","unstructured":"Huiqiang Jiang, Yucheng Li, Chengruidong Zhang, Qianhui Wu, Xufang Luo, Surin Ahn, Zhenhua Han, Amir H. Abdi, Dongsheng Li, Chin-Yew Lin, Yuqing Yang, and Lili Qiu. 2024. MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention. CoRR, Vol. abs\/2407.02490 (2024)."},{"key":"e_1_2_2_47_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.emnlp-main.825"},{"key":"e_1_2_2_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/TBDATA.2019.2921572"},{"key":"e_1_2_2_49_1","unstructured":"Greg Kamradt. 2024. Needle-in-a-Haystack. https:\/\/github.com\/gkamradt\/LLMTest_NeedleInAHaystack"},{"key":"e_1_2_2_50_1","volume-title":"GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLM. CoRR","author":"Kang Hao","year":"2024","unstructured":"Hao Kang, Qingru Zhang, Souvik Kundu, Geonhwa Jeong, Zaoxing Liu, Tushar Krishna, and Tuo Zhao. 2024. GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLM. CoRR, Vol. abs\/2403.05527 (2024)."},{"key":"e_1_2_2_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/3600006.3613165"},{"key":"e_1_2_2_52_1","volume-title":"SnapKV: LLM Knows What You are Looking for Before Generation. CoRR","author":"Li Yuhong","year":"2024","unstructured":"Yuhong Li, Yingbing Huang, Bowen Yang, Bharat Venkitesh, Acyr Locatelli, Hanchen Ye, Tianle Cai, Patrick Lewis, and Deming Chen. 2024. SnapKV: LLM Knows What You are Looking for Before Generation. CoRR, Vol. abs\/2404.14469 (2024)."},{"key":"e_1_2_2_53_1","volume-title":"Large Language Models in Finance: A Survey. In 4th ACM International Conference on AI in Finance, ICAIF.","author":"Li Yinheng","year":"2023","unstructured":"Yinheng Li, Shaofei Wang, Han Ding, and Hang Chen. 2023. Large Language Models in Finance: A Survey. In 4th ACM International Conference on AI in Finance, ICAIF."},{"key":"e_1_2_2_54_1","volume-title":"12th International Conference on Learning Representations, ICLR.","author":"Lingle Lucas D.","year":"2024","unstructured":"Lucas D. Lingle. 2024. Transformer-VQ: Linear-Time Transformers via Vector Quantization. In 12th International Conference on Learning Representations, ICLR."},{"key":"e_1_2_2_55_1","volume-title":"World Model on Million-Length Video And Language With Blockwise RingAttention. CoRR","author":"Liu Hao","year":"2024","unstructured":"Hao Liu, Wilson Yan, Matei Zaharia, and Pieter Abbeel. 2024b. World Model on Million-Length Video And Language With Blockwise RingAttention. CoRR, Vol. abs\/2402.08268 (2024)."},{"key":"e_1_2_2_56_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2022.3177811"},{"key":"e_1_2_2_57_1","doi-asserted-by":"publisher","DOI":"10.1145\/3132847.3132901"},{"key":"e_1_2_2_58_1","doi-asserted-by":"publisher","DOI":"10.1145\/3651890.3672274"},{"key":"e_1_2_2_59_1","volume-title":"Scissorhands: Exploiting the Persistence of Importance Hypothesis for LLM KV Cache Compression at Test Time. In Advances in Neural Information Processing Systems, NeurIPS.","author":"Liu Zichang","year":"2023","unstructured":"Zichang Liu, Aditya Desai, Fangshuo Liao, Weitao Wang, Victor Xie, Zhaozhuo Xu, Anastasios Kyrillidis, and Anshumali Shrivastava. 2023. Scissorhands: Exploiting the Persistence of Importance Hypothesis for LLM KV Cache Compression at Test Time. In Advances in Neural Information Processing Systems, NeurIPS."},{"key":"e_1_2_2_60_1","volume-title":"International Conference on Machine Learning, ICML.","author":"Liu Zirui","year":"2024","unstructured":"Zirui Liu, Jiayi Yuan, Hongye Jin, Shaochen Zhong, Zhaozhuo Xu, Vladimir Braverman, Beidi Chen, and Xia Hu. 2024c. KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache. In International Conference on Machine Learning, ICML."},{"key":"e_1_2_2_61_1","doi-asserted-by":"publisher","DOI":"10.14778\/3489496.3489506"},{"key":"e_1_2_2_62_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2018.2889473"},{"key":"e_1_2_2_63_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11432-022-3581-9"},{"key":"e_1_2_2_64_1","volume-title":"Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems. CoRR","author":"Miao Xupeng","year":"2023","unstructured":"Xupeng Miao, Gabriele Oliaro, Zhihao Zhang, Xinhao Cheng, Hongyi Jin, Tianqi Chen, and Zhihao Jia. 2023b. Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems. CoRR, Vol. abs\/2312.15234 (2023)."},{"key":"e_1_2_2_65_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.emnlp-main.759"},{"key":"e_1_2_2_66_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D18-1206"},{"key":"e_1_2_2_67_1","volume-title":"International Conference on Machine Learning, ICML.","author":"Nguyen Tung","year":"2023","unstructured":"Tung Nguyen, Johannes Brandstetter, Ashish Kapoor, Jayesh K. Gupta, and Aditya Grover. 2023. ClimaX: A foundation model for weather and climate. In International Conference on Machine Learning, ICML."},{"key":"e_1_2_2_68_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE53745.2022.00241"},{"key":"e_1_2_2_69_1","unstructured":"OpenAI. 2023. GPT-4 Technical Report. CoRR Vol. abs\/2303.08774 (2023)."},{"key":"e_1_2_2_70_1","unstructured":"Zhuoshi Pan Qianhui Wu Huiqiang Jiang Menglin Xia Xufang Luo Jue Zhang Qingwei Lin Victor R\u00fchle Yuqing Yang Chin-Yew Lin H. Vicky Zhao Lili Qiu and Dongmei Zhang. 2024. LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression. In Findings of the Association for Computational Linguistics ACL."},{"key":"e_1_2_2_71_1","volume-title":"PyTorch: An Imperative Style","author":"Paszke Adam","unstructured":"Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas K\u00f6pf, Edward Z. Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems, NeurIPS."},{"key":"e_1_2_2_72_1","volume-title":"Scalable Diffusion Models with Transformers. In IEEE\/CVF International Conference on Computer Vision, ICCV.","author":"Peebles William","year":"2023","unstructured":"William Peebles and Saining Xie. 2023. Scalable Diffusion Models with Transformers. In IEEE\/CVF International Conference on Computer Vision, ICCV."},{"key":"e_1_2_2_73_1","unstructured":"Qwen Team. 2024. Qwen2.5: A Party of Foundation Models. https:\/\/qwenlm.github.io\/blog\/qwen2.5\/"},{"key":"e_1_2_2_74_1","doi-asserted-by":"publisher","DOI":"10.1145\/3458817.3476205"},{"key":"e_1_2_2_75_1","volume-title":"ZeRO-Offload: Democratizing Billion-Scale Model Training. In USENIX Annual Technical Conference, ATC.","author":"Ren Jie","year":"2021","unstructured":"Jie Ren, Samyam Rajbhandari, Reza Yazdani Aminabadi, Olatunji Ruwase, Shuangyan Yang, Minjia Zhang, Dong Li, and Yuxiong He. 2021. ZeRO-Offload: Democratizing Billion-Scale Model Training. In USENIX Annual Technical Conference, ATC."},{"key":"e_1_2_2_76_1","doi-asserted-by":"publisher","DOI":"10.14778\/3151106.3151108"},{"key":"e_1_2_2_77_1","volume-title":"Zhu","author":"Ren Siyu","year":"2024","unstructured":"Siyu Ren and Kenny Q. Zhu. 2024. On the Efficacy of Eviction Policy for Key-Value Constrained Generative Language Model Inference. CoRR, Vol. abs\/2402.06262 (2024)."},{"key":"e_1_2_2_78_1","volume-title":"49th Annual IEEE\/ACM International Symposium on Microarchitecture, MICRO.","author":"Rhu Minsoo","unstructured":"Minsoo Rhu, Natalia Gimelshein, Jason Clemons, Arslan Zulfiqar, and Stephen W. Keckler. 2016. vDNN: Virtualized deep neural networks for scalable, memory-efficient neural network design. In 49th Annual IEEE\/ACM International Symposium on Microarchitecture, MICRO."},{"key":"e_1_2_2_79_1","volume-title":"SparQ Attention: Bandwidth-Efficient LLM Inference. In International Conference on Machine Learning, ICML.","author":"Ribar Luka","year":"2024","unstructured":"Luka Ribar, Ivan Chelombiev, Luke Hudlass-Galley, Charlie Blake, Carlo Luschi, and Douglas Orr. 2024. SparQ Attention: Bandwidth-Efficient LLM Inference. In International Conference on Machine Learning, ICML."},{"key":"e_1_2_2_80_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.aiopen.2022.01.001"},{"key":"e_1_2_2_81_1","volume-title":"International Conference on Machine Learning, ICML.","author":"Sheng Ying","year":"2023","unstructured":"Ying Sheng, Lianmin Zheng, Binhang Yuan, Zhuohan Li, Max Ryabinin, Beidi Chen, Percy Liang, Christopher R\u00e9, Ion Stoica, and Ce Zhang. 2023. FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU. In International Conference on Machine Learning, ICML."},{"key":"e_1_2_2_82_1","volume-title":"Compositional Embeddings Using Complementary Partitions for Memory-Efficient Recommendation Systems. In The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining.","author":"Michael Shi Hao-Jun","year":"2020","unstructured":"Hao-Jun Michael Shi, Dheevatsa Mudigere, Maxim Naumov, and Jiyan Yang. 2020. Compositional Embeddings Using Complementary Partitions for Memory-Efficient Recommendation Systems. In The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining."},{"key":"e_1_2_2_83_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11390-024-4058-8"},{"key":"e_1_2_2_84_1","volume-title":"Fault-tolerant Generative LLM Serving. In International Conference on Machine Learning, ICML.","author":"Strati Foteini","year":"2024","unstructured":"Foteini Strati, Sara McAllister, Amar Phanishayee, Jakub Tarnawski, and Ana Klimovic. 2024. D\u00e9j\u00e0Vu: KV-cache Streaming for Fast, Fault-tolerant Generative LLM Serving. In International Conference on Machine Learning, ICML."},{"key":"e_1_2_2_85_1","volume-title":"RazorAttention: Efficient KV Cache Compression Through Retrieval Heads. CoRR","author":"Tang Hanlin","year":"2024","unstructured":"Hanlin Tang, Yang Lin, Jing Lin, Qingsen Han, Shikuan Hong, Yiwu Yao, and Gongyi Wang. 2024. RazorAttention: Efficient KV Cache Compression Through Retrieval Heads. CoRR, Vol. abs\/2407.15891 (2024)."},{"key":"e_1_2_2_86_1","first-page":"7","article-title":"Alpaca: A strong, replicable instruction-following model","volume":"3","author":"Taori Rohan","year":"2023","unstructured":"Rohan Taori, Ishaan Gulrajani, Tianyi Zhang, Yann Dubois, Xuechen Li, Carlos Guestrin, Percy Liang, and Tatsunori B Hashimoto. 2023. Alpaca: A strong, replicable instruction-following model. Stanford Center for Research on Foundation Models, Vol. 3, 6 (2023), 7.","journal-title":"Stanford Center for Research on Foundation Models"},{"key":"e_1_2_2_87_1","unstructured":"Together.ai. 2023. LLaMA-2--7B-32K. https:\/\/huggingface.co\/togethercomputer\/LLaMA-2--7B-32K"},{"key":"e_1_2_2_88_1","unstructured":"Hugo Touvron Louis Martin Kevin Stone Peter Albert Amjad Almahairi Yasmine Babaei Nikolay Bashlykov Soumya Batra Prajjwal Bhargava Shruti Bhosale Dan Bikel Lukas Blecher Cristian Canton-Ferrer Moya Chen Guillem Cucurull David Esiobu Jude Fernandes Jeremy Fu Wenyin Fu Brian Fuller Cynthia Gao Vedanuj Goswami Naman Goyal Anthony Hartshorn Saghar Hosseini Rui Hou Hakan Inan Marcin Kardas Viktor Kerkez Madian Khabsa Isabel Kloumann Artem Korenev Punit Singh Koura Marie-Anne Lachaux Thibaut Lavril Jenya Lee Diana Liskovich Yinghai Lu Yuning Mao Xavier Martinet Todor Mihaylov Pushkar Mishra Igor Molybog Yixin Nie Andrew Poulton Jeremy Reizenstein Rashi Rungta Kalyan Saladi Alan Schelten Ruan Silva Eric Michael Smith Ranjan Subramanian Xiaoqing Ellen Tan Binh Tang Ross Taylor Adina Williams Jian Xiang Kuan Puxin Xu Zheng Yan Iliyan Zarov Yuchen Zhang Angela Fan Melanie Kambadur Sharan Narang Aur\u00e9lien Rodriguez Robert Stojnic Sergey Edunov and Thomas Scialom. 2023. Llama 2: Open Foundation and Fine-Tuned Chat Models. CoRR Vol. abs\/2307.09288 (2023)."},{"key":"e_1_2_2_89_1","unstructured":"A\u00e4ron van den Oord Oriol Vinyals and Koray Kavukcuoglu. 2017. Neural Discrete Representation Learning. In Advances in Neural Information Processing Systems NeurIPS."},{"key":"e_1_2_2_90_1","doi-asserted-by":"publisher","DOI":"10.14778\/3476249.3476255"},{"key":"e_1_2_2_91_1","doi-asserted-by":"publisher","DOI":"10.14778\/3424573.3424580"},{"key":"e_1_2_2_92_1","doi-asserted-by":"publisher","DOI":"10.1109\/COMST.2020.2970550"},{"key":"e_1_2_2_93_1","volume-title":"SqueezeAttention: 2D Management of KV-Cache in LLM Inference via Layer-wise Optimal Budget. CoRR","author":"Wang Zihao","year":"2024","unstructured":"Zihao Wang, Bin Cui, and Shaoduo Gan. 2024. SqueezeAttention: 2D Management of KV-Cache in LLM Inference via Layer-wise Optimal Budget. CoRR, Vol. abs\/2404.04793 (2024)."},{"key":"e_1_2_2_94_1","volume-title":"Quoc V. Le, and Denny Zhou.","author":"Wei Jason","year":"2022","unstructured":"Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V. Le, and Denny Zhou. 2022. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. In Advances in Neural Information Processing Systems, NeurIPS."},{"key":"e_1_2_2_95_1","volume-title":"MLaaS in the Wild: Workload Analysis and Scheduling in Large-Scale Heterogeneous GPU Clusters. In 19th USENIX Symposium on Networked Systems Design and Implementation, NSDI.","author":"Weng Qizhen","year":"2022","unstructured":"Qizhen Weng, Wencong Xiao, Yinghao Yu, Wei Wang, Cheng Wang, Jian He, Yong Li, Liping Zhang, Wei Lin, and Yu Ding. 2022. MLaaS in the Wild: Workload Analysis and Scheduling in Large-Scale Heterogeneous GPU Clusters. In 19th USENIX Symposium on Networked Systems Design and Implementation, NSDI."},{"key":"e_1_2_2_96_1","volume-title":"Scalable Distributed Inverted List Indexes in Disaggregated Memory. In International Conference on Management of Data, SIGMOD.","author":"Widmoser Manuel","year":"2024","unstructured":"Manuel Widmoser, Daniel Kocher, and Nikolaus Augsten. 2024. Scalable Distributed Inverted List Indexes in Disaggregated Memory. In International Conference on Management of Data, SIGMOD."},{"key":"e_1_2_2_97_1","volume-title":"Transparent GPU Sharing in Container Clouds for Deep Learning Workloads. In 20th USENIX Symposium on Networked Systems Design and Implementation, NSDI.","author":"Wu Bingyang","year":"2023","unstructured":"Bingyang Wu, Zili Zhang, Zhihao Bai, Xuanzhe Liu, and Xin Jin. 2023. Transparent GPU Sharing in Container Clouds for Deep Learning Workloads. In 20th USENIX Symposium on Networked Systems Design and Implementation, NSDI."},{"key":"e_1_2_2_98_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11280-024-01291-2"},{"key":"e_1_2_2_99_1","volume-title":"InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Memory. CoRR","author":"Xiao Chaojun","year":"2024","unstructured":"Chaojun Xiao, Pengle Zhang, Xu Han, Guangxuan Xiao, Yankai Lin, Zhengyan Zhang, Zhiyuan Liu, Song Han, and Maosong Sun. 2024b. InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Memory. CoRR, Vol. abs\/2402.04617 (2024)."},{"key":"e_1_2_2_100_1","volume-title":"Efficient Streaming Language Models with Attention Sinks. In 12th International Conference on Learning Representations, ICLR.","author":"Xiao Guangxuan","year":"2024","unstructured":"Guangxuan Xiao, Yuandong Tian, Beidi Chen, Song Han, and Mike Lewis. 2024a. Efficient Streaming Language Models with Attention Sinks. In 12th International Conference on Learning Representations, ICLR."},{"key":"e_1_2_2_101_1","doi-asserted-by":"publisher","DOI":"10.1145\/3477495.3531799"},{"key":"e_1_2_2_102_1","volume-title":"Se Jung Kwon, and Dongsoo Lee","author":"Yang June Yong","year":"2024","unstructured":"June Yong Yang, Byeongwook Kim, Jeongin Bae, Beomseok Kwon, Gunho Park, Eunho Yang, Se Jung Kwon, and Dongsoo Lee. 2024. No Token Left Behind: Reliable KV Cache Compression via Importance-Aware Mixed Precision Quantization. CoRR, Vol. abs\/2402.18096 (2024)."},{"key":"e_1_2_2_103_1","doi-asserted-by":"publisher","DOI":"10.1002\/hcs2.61"},{"key":"e_1_2_2_104_1","volume-title":"PASE: PostgreSQL Ultra-High-Dimensional Approximate Nearest Neighbor Search Extension. In International Conference on Management of Data, SIGMOD.","author":"Yang Wen","year":"2020","unstructured":"Wen Yang, Tao Li, Gai Fang, and Hong Wei. 2020. PASE: PostgreSQL Ultra-High-Dimensional Approximate Nearest Neighbor Search Extension. In International Conference on Management of Data, SIGMOD."},{"key":"e_1_2_2_105_1","doi-asserted-by":"publisher","DOI":"10.1145\/3638757"},{"key":"e_1_2_2_106_1","volume-title":"Orca: A Distributed Serving System for Transformer-Based Generative Models. In 16th USENIX Symposium on Operating Systems Design and Implementation, OSDI.","author":"Yu Gyeong-In","year":"2022","unstructured":"Gyeong-In Yu, Joo Seong Jeong, Geon-Woo Kim, Soojeong Kim, and Byung-Gon Chun. 2022. Orca: A Distributed Serving System for Transformer-Based Generative Models. In 16th USENIX Symposium on Operating Systems Design and Implementation, OSDI."},{"key":"e_1_2_2_107_1","doi-asserted-by":"publisher","DOI":"10.1145\/3639306"},{"key":"e_1_2_2_108_1","doi-asserted-by":"crossref","unstructured":"Hailin Zhang Yujing Wang Qi Chen Ruiheng Chang Ting Zhang Ziming Miao Yingyan Hou Yang Ding Xupeng Miao Haonan Wang Bochen Pang Yuefeng Zhan Hao Sun Weiwei Deng Qi Zhang Fan Yang Xing Xie Mao Yang and Bin Cui. 2023b. Model-enhanced Vector Index. In Advances in Neural Information Processing Systems NeurIPS.","DOI":"10.52202\/075280-2396"},{"key":"e_1_2_2_109_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11432-023-3956-3"},{"key":"e_1_2_2_110_1","doi-asserted-by":"publisher","DOI":"10.14778\/3636218.3636234"},{"key":"e_1_2_2_111_1","volume-title":"Proceedings of the 62nd Conference of the Association for Computational Linguistics (Volume 1: Long Papers), ACL.","author":"Zhang Xinrong","year":"2024","unstructured":"Xinrong Zhang, Yingfa Chen, Shengding Hu, Zihang Xu, Junhao Chen, Moo Hao, Xu Han, Zhen Thai, Shuo Wang, Zhiyuan Liu, and Maosong Sun. 2024a. \u221eBench: Extending Long Context Evaluation Beyond 100K Tokens. In Proceedings of the 62nd Conference of the Association for Computational Linguistics (Volume 1: Long Papers), ACL."},{"key":"e_1_2_2_112_1","unstructured":"Zhenyu Zhang Ying Sheng Tianyi Zhou Tianlong Chen Lianmin Zheng Ruisi Cai Zhao Song Yuandong Tian Christopher R\u00e9 Clark W. Barrett Zhangyang Wang and Beidi Chen. 2023a. H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models. In Advances in Neural Information Processing Systems NeurIPS."},{"key":"e_1_2_2_113_1","doi-asserted-by":"publisher","DOI":"10.14778\/3594512.3594527"},{"key":"e_1_2_2_114_1","volume-title":"Open-Sora: Democratizing Efficient Video Production for All. CoRR","author":"Zheng Zangwei","year":"2040","unstructured":"Zangwei Zheng, Xiangyu Peng, Tianji Yang, Chenhui Shen, Shenggui Li, Hongxin Liu, Yukun Zhou, Tianyi Li, and Yang You. 2024. Open-Sora: Democratizing Efficient Video Production for All. CoRR, Vol. abs\/2412.20404 (2024)."},{"key":"e_1_2_2_115_1","volume-title":"DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving. In 18th USENIX Symposium on Operating Systems Design and Implementation, OSDI.","author":"Zhong Yinmin","year":"2024","unstructured":"Yinmin Zhong, Shengyu Liu, Junda Chen, Jianbo Hu, Yibo Zhu, Xuanzhe Liu, Xin Jin, and Hao Zhang. 2024. DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving. In 18th USENIX Symposium on Operating Systems Design and Implementation, OSDI."},{"key":"e_1_2_2_116_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2018.00085"},{"key":"e_1_2_2_117_1","doi-asserted-by":"publisher","DOI":"10.1007\/s41019-023-00235-6"},{"key":"e_1_2_2_118_1","volume-title":"Relational Data Cleaning Meets Artificial Intelligence: A Survey. Data Science and Engineering","author":"Zhu Jingyu","year":"2024","unstructured":"Jingyu Zhu, Xintong Zhao, Yu Sun, Shaoxu Song, and Xiaojie Yuan. 2024. Relational Data Cleaning Meets Artificial Intelligence: A Survey. Data Science and Engineering (2024), 1-28."}],"container-title":["Proceedings of the ACM on Management of Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3725338","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,31]],"date-time":"2026-03-31T18:52:12Z","timestamp":1774983132000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3725338"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,17]]},"references-count":118,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2025,6,17]]}},"alternative-id":["10.1145\/3725338"],"URL":"https:\/\/doi.org\/10.1145\/3725338","relation":{},"ISSN":["2836-6573"],"issn-type":[{"value":"2836-6573","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,6,17]]}}}