{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,2]],"date-time":"2026-07-02T23:45:21Z","timestamp":1783035921011,"version":"3.54.6"},"reference-count":128,"publisher":"Association for Computing Machinery (ACM)","issue":"3","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Manag. Data"],"published-print":{"date-parts":[[2025,6,17]]},"abstract":"<jats:p>In high-dimensional vector spaces, Approximate Nearest Neighbor Search (ANNS) is a key component in database and artificial intelligence infrastructures. Graph-based ANNS methods, particularly HNSW, have emerged as leading solutions, offering an impressive trade-off between search efficiency and accuracy. Many vector databases utilize graph indexes as their core algorithms, benefiting from various optimizations to enhance search performance. However, the high indexing time associated with graph algorithms poses a significant challenge, especially given the increasing volume of data, query processing complexity, and dynamic index maintenance demand. This has rendered indexing time a critical performance metric for users.<\/jats:p>\n                  <jats:p>\n                    In this paper, we comprehensively analyze the underlying causes of the low graph indexing efficiency on modern CPUs, identifying that distance computation dominates indexing time, primarily due to high memory access latency and suboptimal arithmetic operation efficiency. We demonstrate that distance comparisons during index construction can be effectively performed using compact vector codes at an appropriate compression error. Drawing from insights gained through integrating existing compact coding methods in the graph indexing process, we propose a novel compact coding strategy, named\n                    <jats:italic toggle=\"yes\">Flash,<\/jats:italic>\n                    designed explicitly for graph indexing and optimized for modern CPU architectures. By minimizing random memory accesses and maximizing the utilization of SIMD (Single Instruction, Multiple Data) instructions,\n                    <jats:italic toggle=\"yes\">Flash<\/jats:italic>\n                    significantly enhances cache hit rates and arithmetic operations. Extensive experiments conducted on eight real-world datasets, ranging from ten million to one billion vectors, exhibit that\n                    <jats:italic toggle=\"yes\">Flash<\/jats:italic>\n                    achieves a speedup of 10.4\u00d7 to 22.9\u00d7 in index construction efficiency, while maintaining or improving search performance.\n                  <\/jats:p>","DOI":"10.1145\/3725260","type":"journal-article","created":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T21:23:29Z","timestamp":1750281809000},"page":"1-29","source":"Crossref","is-referenced-by-count":6,"title":["Accelerating Graph Indexing for ANNS on Modern CPUs"],"prefix":"10.1145","volume":"3","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3806-1012","authenticated-orcid":false,"given":"Mengzhao","family":"Wang","sequence":"first","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-0837-7042","authenticated-orcid":false,"given":"Haotian","family":"Wu","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8082-7398","authenticated-orcid":false,"given":"Xiangyu","family":"Ke","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3816-8450","authenticated-orcid":false,"given":"Yunjun","family":"Gao","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4555-6232","authenticated-orcid":false,"given":"Yifan","family":"Zhu","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-2689-6020","authenticated-orcid":false,"given":"Wenchao","family":"Zhou","sequence":"additional","affiliation":[{"name":"Alibaba Group, Hangzhou, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2025,6,18]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Eigen is a C template library for linear algebra: matrices, vectors, numerical solvers, and related algorithms. https:\/\/eigen.tuxfamily.org\/index.php?title=Main_Page. [Online","year":"2024","unstructured":"2009. Eigen is a C template library for linear algebra: matrices, vectors, numerical solvers, and related algorithms. https:\/\/eigen.tuxfamily.org\/index.php?title=Main_Page. [Online; accessed 01-June-2024]."},{"key":"e_1_2_1_2_1","volume-title":"https:\/\/github.com\/nmslib\/hnswlib. [Online","year":"2024","unstructured":"2017. Header-only C\/python library for fast approximate nearest neighbors. https:\/\/github.com\/nmslib\/hnswlib. [Online; accessed 25-July-2024]."},{"key":"e_1_2_1_3_1","volume-title":"A Library for Efficient Similarity Search and Clustering of Dense Vectors. https:\/\/github.com\/facebookresearch\/faiss. [Online","year":"2024","unstructured":"2017. A Library for Efficient Similarity Search and Clustering of Dense Vectors. https:\/\/github.com\/facebookresearch\/faiss. [Online; accessed 25-July-2024]."},{"key":"e_1_2_1_4_1","volume-title":"TOROS N2 - lightweight approximate Nearest Neighbor library which runs fast even with large datasets. https:\/\/github.com\/kakao\/n2. [Online","year":"2024","unstructured":"2017. TOROS N2 - lightweight approximate Nearest Neighbor library which runs fast even with large datasets. https:\/\/github.com\/kakao\/n2. [Online; accessed 25-July-2024]."},{"key":"e_1_2_1_5_1","volume-title":"Possible to continue setting data after building index? https:\/\/github.com\/nmslib\/nmslib\/issues\/73. [Online","year":"2024","unstructured":"2018. Possible to continue setting data after building index? https:\/\/github.com\/nmslib\/nmslib\/issues\/73. [Online; accessed 20-September-2024]."},{"key":"e_1_2_1_6_1","volume-title":"Billion-Scale Approximate Nearest Neighbor Search Challenge: NeurIPS'21 competition track. https:\/\/big-ann-benchmarks.com\/neurips21.html. [Online","year":"2024","unstructured":"2021. Billion-Scale Approximate Nearest Neighbor Search Challenge: NeurIPS'21 competition track. https:\/\/big-ann-benchmarks.com\/neurips21.html. [Online; accessed 01-June-2024]."},{"key":"e_1_2_1_7_1","volume-title":"https:\/\/huggingface.co\/datasets\/Cohere\/wikipedia-22--12-es-embeddings. [Online","year":"2024","unstructured":"2023. Cohere\/wikipedia-22--12-es-embeddings. https:\/\/huggingface.co\/datasets\/Cohere\/wikipedia-22--12-es-embeddings. [Online; accessed 23-September-2024]."},{"key":"e_1_2_1_8_1","volume-title":"kinianlo\/imagenet_embeddings. https:\/\/huggingface.co\/datasets\/kinianlo\/imagenet_embeddings. [Online","year":"2024","unstructured":"2023. kinianlo\/imagenet_embeddings. https:\/\/huggingface.co\/datasets\/kinianlo\/imagenet_embeddings. [Online; accessed 23-September-2024]."},{"key":"e_1_2_1_9_1","volume-title":"nielsr\/datacomp-small-with-embeddings-and-cluster-labels. https:\/\/huggingface.co\/datasets\/nielsr\/datacomp-small-with-embeddings-and-cluster-labels. [Online","year":"2024","unstructured":"2023. nielsr\/datacomp-small-with-embeddings-and-cluster-labels. https:\/\/huggingface.co\/datasets\/nielsr\/datacomp-small-with-embeddings-and-cluster-labels. [Online; accessed 23-September-2024]."},{"key":"e_1_2_1_10_1","volume-title":"Practices & More | Qdrant. https:\/\/qdrant.tech\/articles\/scalar-quantization. [Online","year":"2024","unstructured":"2023. Scalar Quantization: Background, Practices & More | Qdrant. https:\/\/qdrant.tech\/articles\/scalar-quantization. [Online; accessed 02-August-2024]."},{"key":"e_1_2_1_11_1","volume-title":"Vector search in Elasticsearch: The rationale behind the design. https:\/\/www.elastic.co\/search-labs\/blog\/vector-search-elasticsearch-rationale. [Online","year":"2024","unstructured":"2023. Vector search in Elasticsearch: The rationale behind the design. https:\/\/www.elastic.co\/search-labs\/blog\/vector-search-elasticsearch-rationale. [Online; accessed 14-July-2024]."},{"key":"e_1_2_1_12_1","volume-title":"Research Papers in ICDE'24","year":"2024","unstructured":"2024. Accepted Research Papers in ICDE'24. https:\/\/icde2024.github.io\/papers.html. [Online; accessed 14-July-2024]."},{"key":"e_1_2_1_13_1","volume-title":"Research Papers in SIGMOD'24","year":"2024","unstructured":"2024. Accepted Research Papers in SIGMOD'24. https:\/\/2024.sigmod.org\/sigmod-list.html. [Online; accessed 14-July-2024]."},{"key":"e_1_2_1_14_1","volume-title":"Research Papers in VLDB'24","year":"2024","unstructured":"2024. Accepted Research Papers in VLDB'24. https:\/\/vldb.org\/pvldb\/volumes\/17. [Online; accessed 14-July-2024]."},{"key":"e_1_2_1_15_1","volume-title":"anton-l\/wiki-embed-mxbai-embed-large-v1. https:\/\/huggingface.co\/datasets\/anton-l\/wiki-embed-mxbai-embed-large-v1. [Online","year":"2024","unstructured":"2024. anton-l\/wiki-embed-mxbai-embed-large-v1. https:\/\/huggingface.co\/datasets\/anton-l\/wiki-embed-mxbai-embed-large-v1. [Online; accessed 21-September-2024]."},{"key":"e_1_2_1_16_1","volume-title":"argilla-warehouse\/personahub-fineweb-edu-4-embeddings. https:\/\/huggingface.co\/datasets\/argilla-warehouse\/personahub-fineweb-edu-4-embeddings. [Online","year":"2024","unstructured":"2024. argilla-warehouse\/personahub-fineweb-edu-4-embeddings. https:\/\/huggingface.co\/datasets\/argilla-warehouse\/personahub-fineweb-edu-4-embeddings. [Online; accessed 23-September-2024]."},{"key":"e_1_2_1_17_1","volume-title":"bigcode\/stack-exchange-embeddings-20230914. https:\/\/huggingface.co\/datasets\/bigcode\/stack-exchange-embeddings-20230914. [Online","year":"2024","unstructured":"2024. bigcode\/stack-exchange-embeddings-20230914. https:\/\/huggingface.co\/datasets\/bigcode\/stack-exchange-embeddings-20230914. [Online; accessed 23-September-2024]."},{"key":"e_1_2_1_18_1","volume-title":"Hierarchical Navigable Small Worlds (HNSW). https:\/\/www.pinecone.io\/learn\/series\/faiss\/hnsw\/. [Online","year":"2024","unstructured":"2024. Hierarchical Navigable Small Worlds (HNSW). https:\/\/www.pinecone.io\/learn\/series\/faiss\/hnsw\/. [Online; accessed 14-July-2024]."},{"key":"e_1_2_1_19_1","volume-title":"hnsqlite. https:\/\/github.com\/jiggy-ai\/hnsqlite. [Online","year":"2024","unstructured":"2024. hnsqlite. https:\/\/github.com\/jiggy-ai\/hnsqlite. [Online; accessed 20-August-2024]."},{"key":"e_1_2_1_20_1","volume-title":"https:\/\/huggingface.co\/datasets. [Online","author":"Face Hugging","year":"2024","unstructured":"2024. Hugging Face. https:\/\/huggingface.co\/datasets. [Online; accessed 01-June-2024]."},{"key":"e_1_2_1_21_1","volume-title":"Intel\u00ae Intrinsics Guide. https:\/\/www.intel.com\/content\/www\/us\/en\/docs\/intrinsics-guide\/index.html#text=shuffle_epi32&ig_expand=6003,6000,5997,6000,5999,5998. [Online","year":"2024","unstructured":"2024. Intel\u00ae Intrinsics Guide. https:\/\/www.intel.com\/content\/www\/us\/en\/docs\/intrinsics-guide\/index.html#text=shuffle_epi32&ig_expand=6003,6000,5997,6000,5999,5998. [Online; accessed 23-September-2024]."},{"key":"e_1_2_1_22_1","volume-title":"Intel\u00ae Intrinsics Guide. https:\/\/www.intel.com\/content\/www\/us\/en\/docs\/intrinsics-guide\/index.html#techs=AVX_ALL,AVX_512&text=loadu&ig_expand=4108,4110. [Online","year":"2024","unstructured":"2024. Intel\u00ae Intrinsics Guide. https:\/\/www.intel.com\/content\/www\/us\/en\/docs\/intrinsics-guide\/index.html#techs=AVX_ALL,AVX_512&text=loadu&ig_expand=4108,4110. [Online; accessed 15-December-2024]."},{"key":"e_1_2_1_23_1","volume-title":"https:\/\/milvus.io\/docs\/knowhere.md. [Online","year":"2024","unstructured":"2024. Knowhere. https:\/\/milvus.io\/docs\/knowhere.md. [Online; accessed 01-September-2024]."},{"key":"e_1_2_1_24_1","volume-title":"The LAION2B and projections. https:\/\/sisap-challenges.github.io\/2024\/datasets\/. [Online","year":"2024","unstructured":"2024. The LAION2B and projections. https:\/\/sisap-challenges.github.io\/2024\/datasets\/. [Online; accessed 23-September-2024]."},{"key":"e_1_2_1_25_1","volume-title":"New embedding models and API updates. https:\/\/openai.com\/index\/new-embedding-models-and-api-updates\/. [Online","year":"2024","unstructured":"2024. New embedding models and API updates. https:\/\/openai.com\/index\/new-embedding-models-and-api-updates\/. [Online; accessed 20-September-2024]."},{"key":"e_1_2_1_26_1","volume-title":"Performance analysis tools based on Linux perf_events (aka perf) and ftrace. https:\/\/github.com\/brendangregg\/perf-tools. [Online","year":"2024","unstructured":"2024. Performance analysis tools based on Linux perf_events (aka perf) and ftrace. https:\/\/github.com\/brendangregg\/perf-tools. [Online; accessed 23-September-2024]."},{"key":"e_1_2_1_27_1","volume-title":"Step-by-Step Guide to Choosing the Best Embedding Model for Your Application. https:\/\/weaviate.io\/blog\/how-to-choose-an-embedding-model. [Online","year":"2024","unstructured":"2024. Step-by-Step Guide to Choosing the Best Embedding Model for Your Application. https:\/\/weaviate.io\/blog\/how-to-choose-an-embedding-model. [Online; accessed 20-September-2024]."},{"key":"e_1_2_1_28_1","volume-title":"What Goes Around Comes Around. . . And Around. . . . https:\/\/sigmodrecord.org\/publications\/sigmodRecord\/2406\/pdfs\/04_Surveys_Stonebraker.pdf. [Online","year":"2024","unstructured":"2024. What Goes Around Comes Around. . . And Around. . . . https:\/\/sigmodrecord.org\/publications\/sigmodRecord\/2406\/pdfs\/04_Surveys_Stonebraker.pdf. [Online; accessed 16-July-2024]."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.14778\/3611479.3611537"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.14778\/2856318.2856324"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/3078971.3078992"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.14778\/3204028.3204034"},{"key":"e_1_2_1_33_1","first-page":"89","article-title":"Recent Approaches and Trends in Approximate Nearest Neighbor Search, with Remarks on Benchmarking","volume":"46","author":"Aum\u00fcller Martin","year":"2023","unstructured":"Martin Aum\u00fcller and Matteo Ceccarello. 2023. Recent Approaches and Trends in Approximate Nearest Neighbor Search, with Remarks on Benchmarking. IEEE Data Engineering Bulletin 46, 3 (2023), 89--105.","journal-title":"IEEE Data Engineering Bulletin"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.14778\/3583140.3583166"},{"key":"e_1_2_1_35_1","volume-title":"Proceedings of the 36th International Conference on Machine Learning(ICML)","volume":"97","author":"Baranchuk Dmitry","year":"2019","unstructured":"Dmitry Baranchuk, Dmitry Persiyanov, Anton Sinitsin, and Artem Babenko. 2019. Learning to Route in Similarity Graphs. In Proceedings of the 36th International Conference on Machine Learning(ICML), Vol. 97. 475--484."},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIT.2011.2173899"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.14778\/3685800.3685805"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.14778\/3681954.3681959"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/3543507.3583318"},{"key":"e_1_2_1_40_1","unstructured":"Qi Chen Bing Zhao Haidong Wang Mingqin Li Chuanjie Liu Zengzhong Li Mao Yang and Jingdong Wang. 2021. SPANN: Highly-efficient Billion-scale Approximate Nearest Neighborhood Search. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems (NeurIPS). 5199--5212."},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/3511808.3557098"},{"key":"e_1_2_1_42_1","volume-title":"Graph Reordering for Cache-Efficient Near Neighbor Search. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems (NeurIPS).","author":"Coleman Benjamin","year":"2022","unstructured":"Benjamin Coleman, Santiago Segarra, Alexander J. Smola, and Anshumali Shrivastava. 2022. Graph Reordering for Cache-Efficient Near Neighbor Search. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems (NeurIPS)."},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/BigData47090.2019.9006219"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/1963405.1963487"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.14778\/3503585.3503594"},{"key":"e_1_2_1_46_1","volume-title":"The faiss library. arXiv:2401.08281","author":"Douze Matthijs","year":"2024","unstructured":"Matthijs Douze, Alexandr Guzhva, Chengqi Deng, Jeff Johnson, Gergely Szilvasy, Pierre-Emmanuel Mazar\u00e9, Maria Lomeli, Lucas Hosseini, and Herv\u00e9 J\u00e9gou. 2024. The faiss library. arXiv:2401.08281 (2024)."},{"key":"e_1_2_1_47_1","volume-title":"Kimia","author":"Foster Cole","year":"2023","unstructured":"Cole Foster and Benjamin B. Kimia. 2023. Computational Enhancements of HNSW Targeted to Very Large Datasets. In Similarity Search and Applications - 16th International Conference (SISAP), Vol. 14289. 291--299."},{"key":"e_1_2_1_48_1","first-page":"4139","article-title":"High Dimensional Similarity Search with Satellite System Graph: Efficiency, Scalability, and Unindexed Query Compatibility","volume":"44","author":"Fu Cong","year":"2021","unstructured":"Cong Fu, Changxu Wang, and Deng Cai. 2021. High Dimensional Similarity Search with Satellite System Graph: Efficiency, Scalability, and Unindexed Query Compatibility. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 44, 8 (2021), 4139--4150.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.14778\/3303753.3303754"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/3589282"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/3654970"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2013.379"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/3543507.3583552"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11704-023-2678-8"},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1109\/TBDATA.2022.3161156"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.14778\/3554821.3554843"},{"key":"e_1_2_1_57_1","volume-title":"Proceedings of the 37th International Conference on Machine Learning (ICML). 3887--3896","author":"Guo Ruiqi","year":"2020","unstructured":"Ruiqi Guo, Philip Sun, Erik Lindgren, Quan Geng, David Simcha, Felix Chern, and Sanjiv Kumar. 2020. Accelerating Large-Scale Inference with Anisotropic Vector Quantization. In Proceedings of the 37th International Conference on Machine Learning (ICML). 3887--3896."},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1007\/s41019-023-00219-6"},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.emnlp-main.461"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.14778\/2850469.2850470"},{"key":"e_1_2_1_61_1","volume-title":"Neos: A NVMe-GPUs Direct Vector Service Buffer in User Space. In IEEE 40th International Conference on Data Engineering (ICDE). 3767--3781","author":"Huang Yuchen","year":"2024","unstructured":"Yuchen Huang, Xiaopeng Fan, Song Yan, and Chuliang Weng. 2024. Neos: A NVMe-GPUs Direct Vector Service Buffer in User Space. In IEEE 40th International Conference on Data Engineering (ICDE). 3767--3781."},{"key":"e_1_2_1_62_1","doi-asserted-by":"crossref","unstructured":"Piotr Indyk and Haike Xu. 2023. Worst-case Performance of Popular Approximate Nearest Neighbor Search Implementations: Guarantees and Limitations. In Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems (NeurIPS).","DOI":"10.52202\/075280-2891"},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1145\/3624062.3625132"},{"key":"e_1_2_1_64_1","volume-title":"Optimization of Indexing Based on k-Nearest Neighbor Graph for Proximity Search in High-dimensional Data. arXiv:1810.07355","author":"Iwasaki Masajiro","year":"2018","unstructured":"Masajiro Iwasaki and Daisuke Miyazaki. 2018. Optimization of Indexing Based on k-Nearest Neighbor Graph for Proximity Search in High-dimensional Data. arXiv:1810.07355 (2018)."},{"key":"e_1_2_1_65_1","volume-title":"Proceedings of the 2023 USENIX Annual Technical Conference (USENIX ATC). 585--600","author":"Jang Junhyeok","year":"2023","unstructured":"Junhyeok Jang, Hanjin Choi, Hanyeoreum Bae, Seungjun Lee, Miryeong Kwon, and Myoungsoo Jung. 2023. CXL-ANNS: Software-Hardware Collaborative Memory Disaggregation and Computation for Billion-Scale Approximate Nearest Neighbor Search. In Proceedings of the 2023 USENIX Annual Technical Conference (USENIX ATC). 585--600."},{"key":"e_1_2_1_66_1","volume-title":"Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems (NeurIPS). 13748--13758","author":"Subramanya Suhas Jayaram","year":"2019","unstructured":"Suhas Jayaram Subramanya, Fnu Devvrit, Harsha Vardhan Simhadri, Ravishankar Krishnawamy, and Rohan Kadekodi. 2019. DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems (NeurIPS). 13748--13758."},{"key":"e_1_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2010.57"},{"key":"e_1_2_1_68_1","volume-title":"Accelerating Graph-based Vector Search via Delayed-Synchronization Traversal. arXiv:2406.12385","author":"Jiang Wenqi","year":"2024","unstructured":"Wenqi Jiang, Hang Hu, Torsten Hoefler, and Gustavo Alonso. 2024. Accelerating Graph-based Vector Search via Delayed-Synchronization Traversal. arXiv:2406.12385 (2024)."},{"key":"e_1_2_1_69_1","first-page":"20150202","article-title":"Principal component analysis: a review and recent developments. Philosophical transactions of the royal society A","volume":"374","author":"Jolliffe Ian T","year":"2016","unstructured":"Ian T Jolliffe and Jorge Cadima. 2016. Principal component analysis: a review and recent developments. Philosophical transactions of the royal society A: Mathematical, Physical and Engineering Sciences 374, 2065 (2016), 20150202.","journal-title":"Mathematical, Physical and Engineering Sciences"},{"key":"e_1_2_1_70_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-46994-7_12"},{"key":"e_1_2_1_71_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3380600"},{"key":"e_1_2_1_72_1","doi-asserted-by":"publisher","DOI":"10.1145\/3284028.3284030"},{"key":"e_1_2_1_73_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2019.2909204"},{"key":"e_1_2_1_74_1","volume-title":"Optimizing Graph-based Approximate Nearest Neighbor Search: Stronger and Smarter. In 23rd IEEE International Conference on Mobile Data Management (MDM). 179--184","author":"Liu Jun","year":"2022","unstructured":"Jun Liu, Zhenhua Zhu, Jingbo Hu, Hanbo Sun, Li Liu, Lingzhi Liu, Guohao Dai, Huazhong Yang, and Yu Wang. 2022. Optimizing Graph-based Approximate Nearest Neighbor Search: Stronger and Smarter. In 23rd IEEE International Conference on Mobile Data Management (MDM). 179--184."},{"key":"e_1_2_1_75_1","doi-asserted-by":"publisher","DOI":"10.14778\/3489496.3489506"},{"key":"e_1_2_1_76_1","doi-asserted-by":"publisher","DOI":"10.14778\/3397230.3397240"},{"key":"e_1_2_1_77_1","volume-title":"Probabilistic Routing for Graph-Based Approximate Nearest Neighbor Search. arXiv:2402.11354","author":"Lu Kejing","year":"2024","unstructured":"Kejing Lu, Chuan Xiao, and Yoshiharu Ishikawa. 2024. Probabilistic Routing for Graph-Based Approximate Nearest Neighbor Search. arXiv:2402.11354 (2024)."},{"key":"e_1_2_1_78_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2018.2889473"},{"key":"e_1_2_1_79_1","doi-asserted-by":"publisher","DOI":"10.14778\/3489496.3489511"},{"key":"e_1_2_1_80_1","doi-asserted-by":"publisher","DOI":"10.1145\/3323873.3325014"},{"key":"e_1_2_1_81_1","volume-title":"Zanoni Dias, and Ricardo da Silva Torres.","author":"Vargas Mu\u00f1oz Javier Alvaro","year":"2019","unstructured":"Javier Alvaro Vargas Mu\u00f1oz, Marcos Andr\u00e9 Gon\u00e7alves, Zanoni Dias, and Ricardo da Silva Torres. 2019. Hierarchical Clustering-Based Graphs for Large Scale Approximate Nearest Neighbor Search. Pattern Recognition (PR) 96 (2019)."},{"key":"e_1_2_1_82_1","volume-title":"CAGRA: Highly Parallel Graph Construction and Approximate Nearest Neighbor Search for GPUs. In IEEE 40th International Conference on Data Engineering (ICDE). 4236--4247","author":"Ootomo Hiroyuki","year":"2024","unstructured":"Hiroyuki Ootomo, Akira Naruse, Corey Nolet, Ray Wang, Tamas Feher, and Yong Wang. 2024. CAGRA: Highly Parallel Graph Construction and Approximate Nearest Neighbor Search for GPUs. In IEEE 40th International Conference on Data Engineering (ICDE). 4236--4247."},{"key":"e_1_2_1_83_1","doi-asserted-by":"publisher","DOI":"10.1145\/3626246.3654691"},{"key":"e_1_2_1_84_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-024-00864-x"},{"key":"e_1_2_1_85_1","first-page":"120","article-title":"ACORN: Performant and Predicate-Agnostic Search Over Vector Embeddings and Structured Data","volume":"2","author":"Patel Liana","year":"2024","unstructured":"Liana Patel, Peter Kraft, Carlos Guestrin, and Matei Zaharia. 2024. ACORN: Performant and Predicate-Agnostic Search Over Vector Embeddings and Structured Data. Proceedings of the ACM on Management of Data (PACMMOD) 2, 3 (2024), 120.","journal-title":"Proceedings of the ACM on Management of Data (PACMMOD)"},{"key":"e_1_2_1_86_1","volume-title":"The Many Facets of Approximate Similarity Search. In First International Workshop on Similarity Search and Applications (SISAP). 10--21","author":"Patella Marco","year":"2008","unstructured":"Marco Patella and Paolo Ciaccia. 2008. The Many Facets of Approximate Similarity Search. In First International Workshop on Similarity Search and Applications (SISAP). 10--21."},{"key":"e_1_2_1_87_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCAD51958.2021.9643528"},{"key":"e_1_2_1_88_1","doi-asserted-by":"publisher","DOI":"10.1145\/3588908"},{"key":"e_1_2_1_89_1","doi-asserted-by":"publisher","DOI":"10.1145\/3572848.3577527"},{"key":"e_1_2_1_90_1","volume-title":"Deep Learning: Algorithms, Techniques, and Applications. ACM Computing Surveys (CSUR) 51, 5","author":"Pouyanfar Samira","year":"2019","unstructured":"Samira Pouyanfar, Saad Sadiq, Yilin Yan, Haiman Tian, Yudong Tao, Maria E. Presa Reyes, Mei-Ling Shyu, Shu-Ching Chen, and S. S. Iyengar. 2019. A Survey on Deep Learning: Algorithms, Techniques, and Applications. ACM Computing Surveys (CSUR) 51, 5 (2019), 92:1--92:36."},{"key":"e_1_2_1_91_1","unstructured":"Jie Ren Minjia Zhang and Dong Li. 2020. HM-ANN: Efficient Billion-Point Nearest Neighbor Search on Heterogeneous Memory. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems (NeurIPS)."},{"key":"e_1_2_1_92_1","doi-asserted-by":"publisher","DOI":"10.1145\/3128571"},{"key":"e_1_2_1_93_1","volume-title":"Ravishankar Krishnaswamy, and Harsha Vardhan Simhadri.","author":"Singh Aditi","year":"2021","unstructured":"Aditi Singh, Suhas Jayaram Subramanya, Ravishankar Krishnaswamy, and Harsha Vardhan Simhadri. 2021. Freshdiskann: A fast and accurate graph-based ann index for streaming similarity search. arXiv:2105.09613 (2021)."},{"key":"e_1_2_1_94_1","doi-asserted-by":"publisher","DOI":"10.14778\/2556549.2556574"},{"key":"e_1_2_1_95_1","volume-title":"Efficient and accurate nearest neighbor and closest pair search in high-dimensional space. TODS 35, 3","author":"Tao Yufei","year":"2010","unstructured":"Yufei Tao, Ke Yi, Cheng Sheng, and Panos Kalnis. 2010. Efficient and accurate nearest neighbor and closest pair search in high-dimensional space. TODS 35, 3 (2010), 20:1--20:46."},{"key":"e_1_2_1_96_1","volume-title":"Cecilia Aguerrebere, Mark Hildebrand, and Ted Willke.","author":"Tepper Mariano","year":"2023","unstructured":"Mariano Tepper, Ishwar Singh Bhati, Cecilia Aguerrebere, Mark Hildebrand, and Ted Willke. 2023. LeanVec: Search your vectors faster by making them fit. arXiv:2312.16335 (2023)."},{"key":"e_1_2_1_97_1","volume-title":"Proceedings of the 2024 USENIX Annual Technical Conference (USENIX ATC). 1135--1150","author":"Tian Bing","year":"2024","unstructured":"Bing Tian, Haikun Liu, Zhuohui Duan, Xiaofei Liao, Hai Jin, and Yu Zhang. 2024. Scalable Billion-point Approximate Nearest Neighbor Search Using SmartSSDs. In Proceedings of the 2024 USENIX Annual Technical Conference (USENIX ATC). 1135--1150."},{"key":"e_1_2_1_98_1","doi-asserted-by":"publisher","DOI":"10.1145\/3459637.3482344"},{"key":"e_1_2_1_99_1","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2015.2487976"},{"key":"e_1_2_1_100_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3457550"},{"key":"e_1_2_1_101_1","volume-title":"Navigable Proximity Graph-Driven Native Hybrid Queries with Structured and Unstructured Constraints. arXiv:2203.13601","author":"Wang Mengzhao","year":"2022","unstructured":"Mengzhao Wang, Lingwei Lv, Xiaoliang Xu, Yuxiang Wang, Qiang Yue, and Jiongkang Ni. 2022. Navigable Proximity Graph-Driven Native Hybrid Queries with Structured and Unstructured Constraints. arXiv:2203.13601 (2022)."},{"key":"e_1_2_1_102_1","doi-asserted-by":"crossref","unstructured":"Mengzhao Wang Lingwei Lv Xiaoliang Xu Yuxiang Wang Qiang Yue and Jiongkang Ni. 2023. An Efficient and Robust Framework for Approximate Nearest Neighbor Search with Attribute Constraint. In Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems (NeurIPS). 15738--15751.","DOI":"10.52202\/075280-0692"},{"key":"e_1_2_1_103_1","doi-asserted-by":"publisher","DOI":"10.1145\/3639269"},{"key":"e_1_2_1_104_1","doi-asserted-by":"publisher","DOI":"10.14778\/3476249.3476255"},{"key":"e_1_2_1_105_1","doi-asserted-by":"publisher","DOI":"10.14778\/3424573.3424580"},{"key":"e_1_2_1_106_1","doi-asserted-by":"publisher","DOI":"10.14778\/3565816.3565819"},{"key":"e_1_2_1_107_1","volume-title":"et al","author":"Wang Zeyu","year":"2024","unstructured":"Zeyu Wang, Haoran Xiong, Zhenying He, Peng Wang, et al . 2024. Distance Comparison Operators for Approximate Nearest Neighbor Search: Exploration and Benchmark. arXiv:2403.13491 (2024)."},{"key":"e_1_2_1_108_1","doi-asserted-by":"publisher","DOI":"10.14778\/3415478.3415541"},{"key":"e_1_2_1_109_1","doi-asserted-by":"publisher","DOI":"10.1145\/3477495.3531799"},{"key":"e_1_2_1_110_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2021.107305"},{"key":"e_1_2_1_111_1","doi-asserted-by":"publisher","DOI":"10.1145\/3600006.3613166"},{"key":"e_1_2_1_112_1","volume-title":"Tao: A Learning Framework for Adaptive Nearest Neighbor Search using Static Features Only. arXiv:2110.00696","author":"Yang Kaixiang","year":"2021","unstructured":"Kaixiang Yang, Hongya Wang, Bo Xu, Wei Wang, Yingyuan Xiao, Ming Du, and Junfeng Zhou. 2021. Tao: A Learning Framework for Adaptive Nearest Neighbor Search using Static Features Only. arXiv:2110.00696 (2021)."},{"key":"e_1_2_1_113_1","volume-title":"Bridging Speed and Accuracy to Approximate K-Nearest Neighbor Search. arXiv:2404.16322","author":"Yang Mingyu","year":"2024","unstructured":"Mingyu Yang, Jiabao Jin, Xiangyu Wang, Zhitao Shen, Wei Jia, Wentao Li, and Wei Wang. 2024. Bridging Speed and Accuracy to Approximate K-Nearest Neighbor Search. arXiv:2404.16322 (2024)."},{"key":"e_1_2_1_114_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3386131"},{"key":"e_1_2_1_115_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE53745.2022.00046"},{"key":"e_1_2_1_116_1","volume-title":"Routing-Guided Learned Product Quantization for Graph-Based Approximate Nearest Neighbor Search. In IEEE 40th International Conference on Data Engineering (ICDE). 4870--4883","author":"Yue Qiang","year":"2024","unstructured":"Qiang Yue, Xiaoliang Xu, Yuxiang Wang, Yikun Tao, and Xuliyuan Luo. 2024. Routing-Guided Learned Product Quantization for Graph-Based Approximate Nearest Neighbor Search. In IEEE 40th International Conference on Data Engineering (ICDE). 4870--4883."},{"key":"e_1_2_1_117_1","doi-asserted-by":"publisher","DOI":"10.1145\/3613424.3614292"},{"key":"e_1_2_1_118_1","doi-asserted-by":"publisher","DOI":"10.1145\/3459637.3482358"},{"key":"e_1_2_1_119_1","volume-title":"Connecting Compression Spaces with Transformer for Approximate Nearest Neighbor Search. In European Conference on Computer Vision (ECCV)","volume":"13674","author":"Zhang Haokui","year":"2022","unstructured":"Haokui Zhang, Buzhou Tang, Wenze Hu, and Xiaoyu Wang. 2022. Connecting Compression Spaces with Transformer for Approximate Nearest Neighbor Search. In European Conference on Computer Vision (ECCV), Vol. 13674. 515--530."},{"key":"e_1_2_1_120_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-022-00762-0"},{"key":"e_1_2_1_121_1","volume-title":"17th USENIX Symposium on Operating Systems Design and Implementation (OSDI). 377--395","author":"Zhang Qianxi","year":"2023","unstructured":"Qianxi Zhang, Shuotao Xu, Qi Chen, Guoxin Sui, Jiadong Xie, Zhizhen Cai, Yaoqi Chen, Yinxuan He, Yuqing Yang, Fan Yang, et al . 2023. VBASE: Unifying Online Vector Similarity Search and Relational Queries via Relaxed Monotonicity. In 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI). 377--395."},{"key":"e_1_2_1_122_1","volume-title":"Proceedings of the 31th International Conference on Machine Learning (ICML)","volume":"32","author":"Zhang Ting","year":"2014","unstructured":"Ting Zhang, Chao Du, and Jingdong Wang. 2014. Composite Quantization for Approximate Nearest Neighbor Search. In Proceedings of the 31th International Conference on Machine Learning (ICML), Vol. 32. 838--846."},{"key":"e_1_2_1_123_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE60146.2024.00280"},{"key":"e_1_2_1_124_1","volume-title":"Fast Vector Query Processing for Large Datasets Beyond GPU Memory with Reordered Pipelining. In 21st USENIX Symposium on Networked Systems Design and Implementation (NSDI). 23--40","author":"Zhang Zili","year":"2024","unstructured":"Zili Zhang, Fangyue Liu, Gang Huang, Xuanzhe Liu, and Xin Jin. 2024. Fast Vector Query Processing for Large Datasets Beyond GPU Memory with Reordered Pipelining. In 21st USENIX Symposium on Networked Systems Design and Implementation (NSDI). 23--40."},{"key":"e_1_2_1_125_1","volume-title":"SONG: Approximate Nearest Neighbor Search on GPU. In 36th IEEE International Conference on Data Engineering (ICDE). 1033--1044","author":"Zhao Weijie","year":"2020","unstructured":"Weijie Zhao, Shulong Tan, and Ping Li. 2020. SONG: Approximate Nearest Neighbor Search on GPU. In 36th IEEE International Conference on Data Engineering (ICDE). 1033--1044."},{"key":"e_1_2_1_126_1","doi-asserted-by":"publisher","DOI":"10.14778\/3594512.3594527"},{"key":"e_1_2_1_127_1","doi-asserted-by":"publisher","DOI":"10.14778\/3579075.3579084"},{"key":"e_1_2_1_128_1","doi-asserted-by":"publisher","DOI":"10.1145\/3639324"}],"container-title":["Proceedings of the ACM on Management of Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3725260","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,31]],"date-time":"2026-03-31T18:55:45Z","timestamp":1774983345000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3725260"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,17]]},"references-count":128,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2025,6,17]]}},"alternative-id":["10.1145\/3725260"],"URL":"https:\/\/doi.org\/10.1145\/3725260","relation":{},"ISSN":["2836-6573"],"issn-type":[{"value":"2836-6573","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,6,17]]}}}