{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,4]],"date-time":"2026-05-04T23:20:33Z","timestamp":1777936833543,"version":"3.51.4"},"reference-count":77,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2025,2,10]],"date-time":"2025-02-10T00:00:00Z","timestamp":1739145600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001809","name":"The National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["No. 62322213 and 62172419"],"award-info":[{"award-number":["No. 62322213 and 62172419"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Manag. Data"],"published-print":{"date-parts":[[2025,2,10]]},"abstract":"<jats:p>Approximate Nearest Neighbor Search (ANNS) is a critical problem in vector databases. Cluster-based index is utilized to narrow the search scope of ANNS, thereby accelerating the search process. Due to its scalability, it is widely employed in real-world vector search systems. However, existing cluster-based indexes often suffer from coarse granularity, requiring query vectors to compute distances with vectors of varying quality, thus increasing query complexity. Existing work aim to represent vectors with minimal cost, such as using product quantization (PQ) or linear transformations, to speed up ANNS. However, these approaches do not address the coarse granularity inherent in cluster-based index. In this paper, we present an efficient vector data query engine to enhance the granularity of cluster-based index by carefully subdividing clusters using diverse distance metrics. Building on this refined index, we introduce techniques that leverage triangle inequalities to develop highly optimized and distinct search strategies for clusters and vectors of varying qualities, thereby reducing the overhead of ANNS. Extensive experiments demonstrate that our method significantly outperforms existing in-memory cluster-based indexing algorithms, achieving up to an impressive 10\u00d7 speedup and a pruning ratio exceeding 99.4%.<\/jats:p>","DOI":"10.1145\/3709743","type":"journal-article","created":{"date-parts":[[2025,2,11]],"date-time":"2025-02-11T15:45:06Z","timestamp":1739288706000},"page":"1-28","source":"Crossref","is-referenced-by-count":9,"title":["Tribase: A Vector Data Query Engine for Reliable and Lossless Pruning Compression using Triangle Inequalities"],"prefix":"10.1145","volume":"3","author":[{"ORCID":"https:\/\/orcid.org\/0009-0000-3276-3262","authenticated-orcid":false,"given":"Qian","family":"Xu","sequence":"first","affiliation":[{"name":"Renmin University of China, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-2338-6183","authenticated-orcid":false,"given":"Juan","family":"Yang","sequence":"additional","affiliation":[{"name":"Tsinghua University, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1983-7321","authenticated-orcid":false,"given":"Feng","family":"Zhang","sequence":"additional","affiliation":[{"name":"Renmin University of China, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-2070-8290","authenticated-orcid":false,"given":"Junda","family":"Pan","sequence":"additional","affiliation":[{"name":"Renmin University of China, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8368-1109","authenticated-orcid":false,"given":"Kang","family":"Chen","sequence":"additional","affiliation":[{"name":"Tsinghua University, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-3763-7834","authenticated-orcid":false,"given":"Youren","family":"Shen","sequence":"additional","affiliation":[{"name":"Beijing HaiZhi XingTu Technology Co., Ltd., Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9773-9332","authenticated-orcid":false,"given":"Amelie Chi","family":"Zhou","sequence":"additional","affiliation":[{"name":"Hong Kong Baptist University, Hong Kong, Hong Kong"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5757-9135","authenticated-orcid":false,"given":"Xiaoyong","family":"Du","sequence":"additional","affiliation":[{"name":"Renmin University of China, Beijing, China"}]}],"member":"320","published-online":{"date-parts":[[2025,2,11]]},"reference":[{"key":"e_1_2_2_1_1","unstructured":"2024. UCI Machine Learning Repository. https:\/\/archive.ics.uci.edu\/."},{"key":"e_1_2_2_2_1","unstructured":"2024. YouTube. https:\/\/www.youtube.com\/. https:\/\/www.youtube.com"},{"key":"e_1_2_2_3_1","doi-asserted-by":"crossref","unstructured":"Artem Babenko Victor Lempitsky B.Hari Babu N.Subhash Chandra and T.V. Gopal. 2014. The inverted multi-index. IEEE transactions on pattern analysis and machine intelligence 37 6 (2014) 1247--1260.","DOI":"10.1109\/TPAMI.2014.2361319"},{"key":"e_1_2_2_4_1","article-title":"Breaking the Curse of Dimensionality with Convex Neural Networks","volume":"18","author":"Bach Francis R.","year":"2017","unstructured":"Francis R. Bach. 2017. Breaking the Curse of Dimensionality with Convex Neural Networks. J. Mach. Learn. Res. 18 (2017), 19:1--19:53. http:\/\/jmlr.org\/papers\/v18\/14--546.html","journal-title":"J. Mach. Learn. Res."},{"key":"e_1_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01258-8_13"},{"key":"e_1_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/361002.361007"},{"key":"e_1_2_2_7_1","unstructured":"Tom B. Brown Benjamin Mann Nick Ryder Melanie Subbiah Jared Kaplan Prafulla Dhariwal Arvind Neelakantan Pranav Shyam Girish Sastry Amanda Askell Sandhini Agarwal Ariel Herbert-Voss Gretchen Krueger Tom Henighan Rewon Child Aditya Ramesh Daniel M. Ziegler Jeffrey Wu Clemens Winter Christopher Hesse Mark Chen Eric Sigler Mateusz Litwin Scott Gray Benjamin Chess Jack Clark Christopher Berner Sam McCandlish Alec Radford Ilya Sutskever and Dario Amodei. 2020. Language Models are Few-Shot Learners. arXiv:2005.14165 [cs.CL] https:\/\/arxiv.org\/abs\/2005.14165"},{"key":"e_1_2_2_8_1","volume-title":"Spann: Highly-efficient billion-scale approximate nearest neighborhood search. In Advances in Neural Information Processing Systems","author":"Chen Qi","year":"2021","unstructured":"Qi Chen, Bing Zhao, Haidong Wang, Mingqin Li, Chuanjie Liu, Zengzhong Li, Mao Yang, and Jingdong Wang. 2021. Spann: Highly-efficient billion-scale approximate nearest neighborhood search. In Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J.Wortman Vaughan (Eds.). Vol. 34. Curran Associates, Inc, 5199--5212."},{"key":"e_1_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.14778\/3681954.3681979"},{"key":"e_1_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/177424.177609"},{"key":"e_1_2_2_11_1","volume-title":"Proceedings of the Twentieth Annual Symposium on Computational Geometry, SCG '04","author":"Datar Mayur","unstructured":"Mayur Datar, Nicole Immorlica, Piotr Indyk, and Va-hab S. Mirrokni. 2004. Locality-sensitive hashing scheme based on p-stable distributions. In Proceedings of the Twentieth Annual Symposium on Computational Geometry, SCG '04. 253--262."},{"key":"e_1_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/2408776.2408794"},{"key":"e_1_2_2_13_1","first-page":"793","article-title":"Sur la sph\u00e8re vide","volume":"6","author":"Delaunay B.N.","year":"1934","unstructured":"B.N. Delaunay. 1934. Sur la sph\u00e8re vide. Bull. Acad. Sci. URSS 6 (1934), 793--800.","journal-title":"Bull. Acad. Sci. URSS"},{"key":"e_1_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/3140587.3062377"},{"key":"e_1_2_2_15_1","volume-title":"TOP: A Compiler-Based Framework for Optimizing Machine Learning Algorithms through Generalized Triangle Inequality. https:\/\/api.semanticscholar.org\/CorpusID:6500348","author":"Ding Yufei","year":"2018","unstructured":"Yufei Ding, Lin Ning, Hui Guang, Xipeng Shen, and Madan Musuvathi. 2018. TOP: A Compiler-Based Framework for Optimizing Machine Learning Algorithms through Generalized Triangle Inequality. https:\/\/api.semanticscholar.org\/CorpusID:6500348"},{"key":"e_1_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/1963405.1963487"},{"key":"e_1_2_2_17_1","unstructured":"Facebook. 2020. Faiss. https:\/\/github.com\/facebookresearch\/faiss."},{"key":"e_1_2_2_18_1","volume-title":"EFANNA : An Extremely Fast Approximate Nearest Neighbor Search Algorithm Based on kNN Graph.","author":"Fu Cong","year":"2016","unstructured":"Cong Fu and Deng Cai. 2016. EFANNA : An Extremely Fast Approximate Nearest Neighbor Search Algorithm Based on kNN Graph."},{"key":"e_1_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.14778\/3303753.3303754"},{"key":"e_1_2_2_20_1","volume-title":"Sokal","author":"Ruben Gabriel K.","year":"1969","unstructured":"K.Ruben Gabriel and Robert R. Sokal. 1969. A new statistical approach to geographic variation analysis. Systematic zoology 18, 3 (1969), 259--278."},{"key":"e_1_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/3589282"},{"key":"e_1_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.1609\/AAAI.V38I18.29960"},{"key":"e_1_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.5555\/645925.671516"},{"key":"e_1_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.1037\/1076-898X.6.4.322"},{"key":"e_1_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/3626765"},{"key":"e_1_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.14778\/3554821.3554843"},{"key":"e_1_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.5591\/978--1--57735--516--8\/IJCAI11--222"},{"key":"e_1_2_2_28_1","volume-title":"2008 IEEE Conference on Computer Vision and Pattern Recognition. 1--8.","author":"Jain P.","unstructured":"P. Jain, B. Kulis, and K. Grauman. 2008-06. Fast image search for learned metrics. In 2008 IEEE Conference on Computer Vision and Pattern Recognition. 1--8."},{"key":"e_1_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/2534169.2486028"},{"key":"e_1_2_2_30_1","volume-title":"Ravishankar Krishnawamy, and Rohan Kadekodi.","author":"Subramanya Suhas Jayaram","year":"2019","unstructured":"Suhas Jayaram Subramanya, Fnu Devvrit, Harsha Vardhan Simhadri, Ravishankar Krishnawamy, and Rohan Kadekodi. 2019. DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alch\u00e9-Buc, E. Fox, and R. Garnett (Eds.), Vol. 32. Curran Associates, Inc. https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2019\/file\/09853c7fb1d3f8ee67a61b6bf4a7f8e6-Paper.pdf"},{"key":"e_1_2_2_31_1","volume-title":"Product quantization for nearest neighbor search","author":"Jegou Herve","year":"2010","unstructured":"Herve Jegou, Matthijs Douze, and Cordelia Schmid. 2010. Product quantization for nearest neighbor search. IEEE transactions on pattern analysis and machine intelligence 33, 1 (2010), 117--128."},{"key":"e_1_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2011.5946540"},{"key":"e_1_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.298"},{"key":"e_1_2_2_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/2396761.2396831"},{"key":"e_1_2_2_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2009.5459466"},{"key":"e_1_2_2_36_1","unstructured":"Daniel LeJeune Richard G. Baraniuk and Reinhard Heckel. 2019. Adaptive Estimation for Approximate k-Nearest-Neighbor Computations."},{"key":"e_1_2_2_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/3539618.3591651"},{"key":"e_1_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/3035918.3064009"},{"key":"e_1_2_2_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/3284028.3284030"},{"key":"e_1_2_2_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/3447548.3467101"},{"key":"e_1_2_2_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/3366423.3380151"},{"key":"e_1_2_2_42_1","volume-title":"NIPS","author":"Liu Ting","year":"2004","unstructured":"Ting Liu, Andrew W. Moore, Alexander Gray, and Ke Yang. December 13--18, 2004. An investigation of practical approximate nearest neighbor algorithms. In Advances in Neural Information Processing Systems 17 [Neural Information Processing Systems, NIPS 2004. Vancouver, British Columbia, Canada, 825--832."},{"key":"e_1_2_2_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/3620665.3640360"},{"key":"e_1_2_2_44_1","volume-title":"Hnswlib: Fast approximate nearest neighbor search. https:\/\/github.com\/nmslib\/hnswlib. Accessed: 2024--10--15.","author":"Malkov Yury","year":"2020","unstructured":"Yury Malkov. 2020. Hnswlib: Fast approximate nearest neighbor search. https:\/\/github.com\/nmslib\/hnswlib. Accessed: 2024--10--15."},{"key":"e_1_2_2_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2018.2889473"},{"key":"e_1_2_2_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00272"},{"key":"e_1_2_2_47_1","unstructured":"Tomas Mikolov Kai Chen Greg Corrado and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. arXiv:1301.3781 [cs.CL."},{"key":"e_1_2_2_48_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v24i1.7683"},{"key":"e_1_2_2_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2014.2321376"},{"key":"e_1_2_2_50_1","volume-title":"10th USENIX Symposium on Networked Systems Design and Implementation (NSDI 13","author":"Nishtala Rajesh","year":"2013","unstructured":"Rajesh Nishtala, Hans Fugal, Steven Grimm, Marc Kwiatkowski, Herman Lee, Harry C. Li, Ryan McElroy, Mike Paleczny, Daniel Peek, and Paul Saab. 2013. Scaling memcache at facebook. In 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI 13. 385--398."},{"key":"e_1_2_2_51_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE53745.2022.00268"},{"key":"e_1_2_2_52_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.19"},{"key":"e_1_2_2_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/3572848.3577527"},{"key":"e_1_2_2_54_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14--1162"},{"key":"e_1_2_2_55_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11633-017--1054--2"},{"key":"e_1_2_2_56_1","volume-title":"Proceedings ofthe 34th International Conference on Neural Information Processing Systems 895","author":"Ren Jie","year":"2020","unstructured":"Jie Ren, Minjia Zhang, and Dong Li. 2020. HM-ANN: Efficient Billion Point Nearest Neighbor Search on Heterogeneous Memory. Proceedings ofthe 34th International Conference on Neural Information Processing Systems 895 (2020), 20."},{"key":"e_1_2_2_57_1","doi-asserted-by":"publisher","DOI":"10.1137\/22M1525193"},{"key":"e_1_2_2_58_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.327"},{"key":"e_1_2_2_59_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-89657-7_3"},{"key":"e_1_2_2_60_1","doi-asserted-by":"publisher","DOI":"10.14778\/2556549.2556574"},{"key":"e_1_2_2_61_1","doi-asserted-by":"publisher","DOI":"10.1145\/3677134"},{"key":"e_1_2_2_62_1","volume-title":"The relative neighbourhood graph of a finite planar set. Pattern recognition 12, 4","author":"Toussaint Godfried T.","year":"1980","unstructured":"Godfried T. Toussaint. 1980. The relative neighbourhood graph of a finite planar set. Pattern recognition 12, 4 (1980), 261--268."},{"key":"e_1_2_2_63_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2012.6247790"},{"key":"e_1_2_2_64_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2013.125"},{"key":"e_1_2_2_65_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3457550"},{"key":"e_1_2_2_66_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2017.2699960"},{"key":"e_1_2_2_67_1","doi-asserted-by":"publisher","DOI":"10.1109\/IJCNN.2011.6033373"},{"key":"e_1_2_2_68_1","doi-asserted-by":"publisher","DOI":"10.1109\/CCGrid51090.2021.00023"},{"key":"e_1_2_2_69_1","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2019.00061"},{"key":"e_1_2_2_70_1","doi-asserted-by":"publisher","DOI":"10.14778\/3415478.3415541"},{"key":"e_1_2_2_71_1","unstructured":"Yair Weiss Antonio Torralba and Rob Fergus. 2009. Spectral hashing. In Advances in neural information processing systems. 1753--1760."},{"key":"e_1_2_2_72_1","doi-asserted-by":"publisher","DOI":"10.1145\/3485447.3511957"},{"key":"e_1_2_2_73_1","doi-asserted-by":"publisher","DOI":"10.14778\/3665844.3665852"},{"key":"e_1_2_2_74_1","doi-asserted-by":"publisher","DOI":"10.1145\/3600006.3613166"},{"key":"e_1_2_2_75_1","doi-asserted-by":"publisher","DOI":"10.1145\/3613424.3614292"},{"key":"e_1_2_2_76_1","volume-title":"17th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2023","author":"Zhang Qianxi","year":"2023","unstructured":"Qianxi Zhang, Shuotao Xu, Qi Chen, Guoxin Sui, Jiadong Xie, Zhizhen Cai, Yaoqi Chen, Yinxuan He, Yuqing Yang, Fan Yang, Mao Yang, and Lidong Zhou. 2023. VBASE: Unifying Online Vector Similarity Search and Relational Queries via Relaxed Monotonicity. In 17th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2023, Boston, MA, USA, July 10--12, 2023, Roxana Geambasu and Ed Nightingale (Eds.). USENIX Association, 377--395. https:\/\/www.usenix.org\/conference\/osdi23\/presentation\/zhang-qianxi"},{"key":"e_1_2_2_77_1","volume-title":"Proceedings ofthe 31th International Conference on Machine Learning (ICML","volume":"32","author":"Zhang Ting","year":"2014","unstructured":"Ting Zhang, Chao Du, and Jingdong Wang. 2014. Composite quantization for approximate nearest neighbor search. In Proceedings ofthe 31th International Conference on Machine Learning (ICML, Vol. 32. 838--846."}],"container-title":["Proceedings of the ACM on Management of Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3709743","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3709743","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,31]],"date-time":"2026-03-31T18:18:21Z","timestamp":1774981101000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3709743"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,2,10]]},"references-count":77,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2025,2,10]]}},"alternative-id":["10.1145\/3709743"],"URL":"https:\/\/doi.org\/10.1145\/3709743","relation":{},"ISSN":["2836-6573"],"issn-type":[{"value":"2836-6573","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,2,10]]}}}