{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,11]],"date-time":"2026-01-11T03:51:56Z","timestamp":1768103516304,"version":"3.49.0"},"reference-count":56,"publisher":"Association for Computing Machinery (ACM)","issue":"6","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Manag. Data"],"published-print":{"date-parts":[[2025,12,4]]},"abstract":"<jats:p>Approximate Nearest Neighbor Search (ANNS) plays a critical role in applications such as search engines, recommender systems, and RAG for LLMs. Vector quantization (VQ), a crucial technique for ANNS, is commonly used to reduce space overhead and accelerate distance computations. However, despite significant research advances, state-of-the-art VQ methods still face challenges in balancing encoding efficiency and quantization accuracy. To address these limitations, we propose a novel VQ method called SAQ. To improve accuracy, SAQ employs a new dimension segmentation technique to strategically partition PCA-projected vectors into segments along their dimensions. By prioritizing leading dimension segments with larger magnitudes, SAQ allocates more bits to high-impact segments, optimizing the use of the available space quota. An efficient dynamic programming algorithm is developed to optimize dimension segmentation and bit allocation, ensuring minimal quantization error. To speed up vector encoding, SAQ devises a code adjustment technique to first quantize each dimension independently and then progressively refine quantized vectors using a coordinate-descent-like approach to avoid exhaustive enumeration. Extensive experiments demonstrate SAQ's superiority over classical methods (e.g., PQ, PCA) and recent state-of-the-art approaches (e.g., LVQ, Extended RabitQ). SAQ achieves up to 80% reduction in quantization error and accelerates encoding speed by over 80\u00d7 compared to Extended RabitQ.<\/jats:p>","DOI":"10.1145\/3769824","type":"journal-article","created":{"date-parts":[[2025,12,6]],"date-time":"2025-12-06T04:32:13Z","timestamp":1764995533000},"page":"1-25","source":"Crossref","is-referenced-by-count":0,"title":["SAQ: Pushing the Limits of Vector Quantization through Code Adjustment and Dimension Segmentation"],"prefix":"10.1145","volume":"3","author":[{"ORCID":"https:\/\/orcid.org\/0009-0009-0772-6588","authenticated-orcid":false,"given":"Hui","family":"Li","sequence":"first","affiliation":[{"name":"The Chinese University of Hong Kong, Hong Kong, Hong Kong"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-6088-6313","authenticated-orcid":false,"given":"Shiyuan","family":"Deng","sequence":"additional","affiliation":[{"name":"Huawei Cloud, Hong Kong, Hong Kong"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2122-915X","authenticated-orcid":false,"given":"Xiao","family":"Yan","sequence":"additional","affiliation":[{"name":"Institute for Math &amp; AI, Wuhan University, Wuhan, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-0122-9240","authenticated-orcid":false,"given":"Xiangyu","family":"Zhi","sequence":"additional","affiliation":[{"name":"The Chinese University of Hong Kong, Hong Kong, Hong Kong"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6313-6288","authenticated-orcid":false,"given":"James","family":"Cheng","sequence":"additional","affiliation":[{"name":"The Chinese University of Hong Kong, Hong Kong, Hong Kong"}]}],"member":"320","published-online":{"date-parts":[[2025,12,5]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.14778\/3611479.3611537"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/FOCS.2017.65"},{"key":"e_1_2_1_3_1","volume-title":"42nd International Conference on Very Large Data Bases","volume":"9","author":"Andr\u00e9 Fabien","year":"2016","unstructured":"Fabien Andr\u00e9, Anne-Marie Kermarrec, and Nicolas Le Scouarnec. 2016. Cache locality is not enough: High-performance nearest neighbor search with product quantization fast scan. In 42nd International Conference on Very Large Data Bases, Vol. 9. 12."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.is.2019.02.006"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.124"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-0-387-74161-1_36"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.3390\/s101211259"},{"key":"e_1_2_1_8_1","volume-title":"Cohere Embed V3. https:\/\/cohere.com\/blog\/introducing-embed-v3","year":"2023","unstructured":"Cohere. [n.d.]. Cohere Embed V3. https:\/\/cohere.com\/blog\/introducing-embed-v3 (2023)."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.14778\/3712221.3712244"},{"key":"e_1_2_1_10_1","first-page":"4171","volume-title":"Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies","volume":"1","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers). 4171-4186."},{"key":"e_1_2_1_11_1","volume-title":"https:\/\/www.elastic.co\/enterprise-search\/vector-search","year":"2024","unstructured":"Elastic. [n.d.]. Elastic. https:\/\/www.elastic.co\/enterprise-search\/vector-search (2024)."},{"key":"e_1_2_1_12_1","volume-title":"https:\/\/github.com\/facebookresearch\/faiss","year":"2023","unstructured":"Faiss. [n.d.]. Faiss. https:\/\/github.com\/facebookresearch\/faiss (2023)."},{"key":"e_1_2_1_13_1","unstructured":"Jonathan T Foote. 2003. TreeQ Manual V0. 8."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.14778\/3303753.3303754"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/2213836.2213898"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/3725413"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3589282"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/3654970"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2013.379"},{"key":"e_1_2_1_20_1","first-page":"482","article-title":"Quantization based fast inner product search. In Artificial intelligence and statistics","author":"Guo Ruiqi","year":"2016","unstructured":"Ruiqi Guo, Sanjiv Kumar, Krzysztof Choromanski, and David Simcha. 2016. Quantization based fast inner product search. In Artificial intelligence and statistics. PMLR, 482-490.","journal-title":"PMLR"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/3394486.3403305"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.14778\/2850469.2850470"},{"key":"e_1_2_1_23_1","volume-title":"Product quantization for nearest neighbor search","author":"Jegou Herve","year":"2010","unstructured":"Herve Jegou, Matthijs Douze, and Cordelia Schmid. 2010. Product quantization for nearest neighbor search. IEEE transactions on pattern analysis and machine intelligence, Vol. 33, 1 (2010), 117-128."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2010.57"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.298"},{"key":"e_1_2_1_26_1","unstructured":"Patrick Lewis Ethan Perez Aleksandra Piktus Fabio Petroni Vladimir Karpukhin Naman Goyal Heinrich K\u00fcttler Mike Lewis Wen-tau Yih Tim Rockt\u00e4schel et al. 2020. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in neural information processing systems Vol. 33 (2020) 9459-9474."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/3035918.3064009"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2019.2909204"},{"key":"e_1_2_1_29_1","first-page":"1092","volume-title":"Science","volume":"378","author":"Li Yujia","year":"2022","unstructured":"Yujia Li, David Choi, Junyoung Chung, Nate Kushman, Julian Schrittwieser, R\u00e9mi Leblond, Tom Eccles, James Keeling, Felix Gimeno, Agustin Dal Lago, et al., 2022. Competition-level code generation with alphacode. Science, Vol. 378, 6624 (2022), 1092-1097."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2016.2593344"},{"key":"e_1_2_1_31_1","volume-title":"Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs","author":"Malkov Yu A","year":"2018","unstructured":"Yu A Malkov and Dmitry A Yashunin. 2018a. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE transactions on pattern analysis and machine intelligence, Vol. 42, 4 (2018), 824-836."},{"key":"e_1_2_1_32_1","volume-title":"Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs","author":"Malkov Yu A","year":"2018","unstructured":"Yu A Malkov and Dmitry A Yashunin. 2018b. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE transactions on pattern analysis and machine intelligence, Vol. 42, 4 (2018), 824-836."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46475-6_9"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01270-0_30"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.3169\/mta.6.2"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/3589777"},{"key":"e_1_2_1_37_1","volume-title":"New embedding models and API updates., https:\/\/openai.com\/index\/new-embedding-models-and-api-updates\/","author":"AI.","year":"2024","unstructured":"OpenAI. [n.d.]. New embedding models and API updates., https:\/\/openai.com\/index\/new-embedding-models-and-api-updates\/ (2024)."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/SISAP.2008.18"},{"key":"e_1_2_1_39_1","volume-title":"pgvector. https:\/\/github.com\/pgvector\/pgvector","year":"2024","unstructured":"pgvector. [n.d.]. pgvector. https:\/\/github.com\/pgvector\/pgvector (2024)."},{"key":"e_1_2_1_40_1","volume-title":"https:\/\/qdrant.tech\/","year":"2024","unstructured":"Qdrant. [n.d.]. Qdrant. https:\/\/qdrant.tech\/ (2024)."},{"key":"e_1_2_1_41_1","volume-title":"International conference on machine learning. PmLR, 8748-8763","author":"Radford Alec","year":"2021","unstructured":"Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al., 2021. Learning transferable visual models from natural language supervision. In International conference on machine learning. PmLR, 8748-8763."},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01939"},{"key":"e_1_2_1_43_1","volume-title":"https:\/\/www.singlestore.com\/built-in-vector","year":"2024","unstructured":"SingleStore. [n.d.]. SingleStore. https:\/\/www.singlestore.com\/built-in-vector (2024)."},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/3654990"},{"key":"e_1_2_1_45_1","first-page":"3189","article-title":"SOAR: improved indexing for approximate nearest neighbor search","volume":"36","author":"Sun Philip","year":"2023","unstructured":"Philip Sun, David Simcha, Dave Dopson, Ruiqi Guo, and Sanjiv Kumar. 2023. SOAR: improved indexing for approximate nearest neighbor search. Advances in Neural Information Processing Systems, Vol. 36 (2023), 3189-3204.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_1_46_1","article-title":"LeanVec: Searching vectors faster by making them fit","volume":"2024","author":"Tepper Mariano","year":"2024","unstructured":"Mariano Tepper, Ishwar Singh Bhati, Cecilia Aguerrebere, Mark Hildebrand, and Theodore L. Willke. 2024b. LeanVec: Searching vectors faster by making them fit. Trans. Mach. Learn. Res., Vol. 2024 (2024). https:\/\/openreview.net\/forum?id=wczqrpOrIc","journal-title":"Trans. Mach. Learn. Res."},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2410.22347"},{"key":"e_1_2_1_48_1","volume-title":"https:\/\/trec-rag.github.io\/annoucements\/2024-corpus-finalization\/","author":"TREC-RAG. [n.d.]. TREC-RAG Corp","year":"2024","unstructured":"TREC-RAG. [n.d.]. TREC-RAG Corpus 2024. https:\/\/trec-rag.github.io\/annoucements\/2024-corpus-finalization\/ (2024)."},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3457550"},{"key":"e_1_2_1_50_1","volume-title":"Composite quantization","author":"Wang Jingdong","year":"2018","unstructured":"Jingdong Wang and Ting Zhang. 2018. Composite quantization. IEEE transactions on pattern analysis and machine intelligence, Vol. 41, 6 (2018), 1308-1322."},{"key":"e_1_2_1_51_1","volume-title":"Heng Tao Shen, et al","author":"Wang Jingdong","year":"2017","unstructured":"Jingdong Wang, Ting Zhang, Nicu Sebe, Heng Tao Shen, et al., 2017. A survey on learning to hash. IEEE transactions on pattern analysis and machine intelligence, Vol. 40, 4 (2017), 769-790."},{"key":"e_1_2_1_52_1","volume-title":"Coordinate descent algorithms. Mathematical programming","author":"Wright Stephen J","year":"2015","unstructured":"Stephen J Wright. 2015. Coordinate descent algorithms. Mathematical programming, Vol. 151, 1 (2015), 3-34."},{"key":"e_1_2_1_53_1","volume-title":"Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval. In 9th International Conference on Learning Representations, ICLR 2021","author":"Xiong Lee","year":"2021","unstructured":"Lee Xiong, Chenyan Xiong, Ye Li, Kwok-Fung Tang, Jialin Liu, Paul N. Bennett, Junaid Ahmed, and Arnold Overwijk. 2021. Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net. https:\/\/openreview.net\/forum?id=zeFrfgyZln"},{"key":"e_1_2_1_54_1","unstructured":"Mingyu Yang Wentao Li and Wei Wang. 2025. Fast High-dimensional Approximate Nearest Neighbor Search with Efficient Index Time and Space. arXiv:2411.06158 [cs.DB] https:\/\/arxiv.org\/abs\/2411.06158"},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3386131"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2508.15290"}],"container-title":["Proceedings of the ACM on Management of Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3769824","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,1,10]],"date-time":"2026-01-10T23:29:01Z","timestamp":1768087741000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3769824"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,12,4]]},"references-count":56,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2025,12,4]]}},"alternative-id":["10.1145\/3769824"],"URL":"https:\/\/doi.org\/10.1145\/3769824","relation":{},"ISSN":["2836-6573"],"issn-type":[{"value":"2836-6573","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,12,4]]}}}