{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,19]],"date-time":"2026-03-19T04:40:44Z","timestamp":1773895244346,"version":"3.50.1"},"reference-count":69,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2024,5,29]],"date-time":"2024-05-29T00:00:00Z","timestamp":1716940800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Manag. Data"],"published-print":{"date-parts":[[2024,5,29]]},"abstract":"<jats:p>Applications increasingly leverage mixed-modality data, and must jointly search over vector data, such as embedded images, text and video, as well as structured data, such as attributes and keywords. Proposed methods for this hybrid search setting either suffer from poor performance or support a severely restricted set of search predicates (e.g., only small sets of equality predicates), making them impractical for many applications. To address this, we present ACORN, an approach for performant and predicate-agnostic hybrid search. ACORN builds on Hierarchical Navigable Small Worlds (HNSW), a state-of-the-art graph-based approximate nearest neighbor index, and can be implemented efficiently by extending existing HNSW libraries. ACORN introduces the idea of predicate subgraph traversal to emulate a theoretically ideal, but impractical, hybrid search strategy. ACORN's predicate-agnostic construction algorithm is designed to enable this effective search strategy, while supporting a wide array of predicate sets and query semantics. We systematically evaluate ACORN on both prior benchmark datasets, with simple, low-cardinality predicate sets, and complex multi-modal datasets not supported by prior methods. We show that ACORN achieves state-of-the-art performance on all datasets, outperforming prior methods with 2--1,000\u00d7 higher throughput at a fixed recall. Our code is available at: https:\/\/github.com\/stanford-futuredata\/ACORN.<\/jats:p>","DOI":"10.1145\/3654923","type":"journal-article","created":{"date-parts":[[2024,5,30]],"date-time":"2024-05-30T09:44:53Z","timestamp":1717062293000},"page":"1-27","source":"Crossref","is-referenced-by-count":32,"title":["ACORN: Performant and Predicate-Agnostic Search Over Vector Embeddings and Structured Data"],"prefix":"10.1145","volume":"2","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3464-9556","authenticated-orcid":false,"given":"Liana","family":"Patel","sequence":"first","affiliation":[{"name":"Stanford University, Stanford, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-7215-7425","authenticated-orcid":false,"given":"Peter","family":"Kraft","sequence":"additional","affiliation":[{"name":"DBOS, Inc., Mountain View, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6348-5939","authenticated-orcid":false,"given":"Carlos","family":"Guestrin","sequence":"additional","affiliation":[{"name":"Stanford University, Stanford, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7547-7204","authenticated-orcid":false,"given":"Matei","family":"Zaharia","sequence":"additional","affiliation":[{"name":"UC Berkeley, Berkeley, USA"}]}],"member":"320","published-online":{"date-parts":[[2024,5,30]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"[n. d.]. Filtered Vector Search | Weaviate - vector database. https:\/\/weaviate.io\/developers\/weaviate\/concepts\/ prefiltering"},{"key":"e_1_2_1_2_1","unstructured":"[n. d.]. Pre-label and enrich data with bulk classifications. https:\/\/labelbox.ghost.io\/blog\/pre-label-and-enrich-yourdata-with-bulk-classifications\/"},{"key":"e_1_2_1_3_1","unstructured":"[n. d.]. Q&A over Documents - LlamaIndex 0.8.43. https:\/\/gpt-index.readthedocs.io\/en\/latest\/end_to_end_tutorials\/ question_and_answer.html"},{"key":"e_1_2_1_4_1","unstructured":"2023. Building Chat LangChain. https:\/\/blog.langchain.dev\/building-chat-langchain-2\/"},{"key":"e_1_2_1_5_1","unstructured":"2023. DiskANN. https:\/\/github.com\/microsoft\/DiskANN original-date: 2020-06--18T06:18:06Z."},{"key":"e_1_2_1_6_1","unstructured":"2023. Faiss. https:\/\/github.com\/facebookresearch\/faiss original-date: 2017-02-07T16:07:05Z."},{"key":"e_1_2_1_7_1","unstructured":"2023. Milvus Documentation. https:\/\/github.com\/milvus-io\/milvus-docs original-date: 2020-05--27T09:12:23Z."},{"key":"e_1_2_1_8_1","unstructured":"2023. visual-layer\/fastdup. https:\/\/github.com\/visual-layer\/fastdup original-date: 2022-05--11T09:10:01Z."},{"key":"e_1_2_1_9_1","unstructured":"Ann Arbor Algorithms. 2023. KGraph: A Library for Approximate Nearest Neighbor Search. https:\/\/github.com\/ aaalgo\/kgraph original-date: 2015-05--29T12:38:24Z."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/1327452.1327494"},{"key":"e_1_2_1_11_1","volume-title":"Proceedings of the 28th International Conference on Neural Information Processing Systems -","volume":"1","author":"Andoni Alexandr","year":"2015","unstructured":"Alexandr Andoni, Piotr Indyk, Thijs Laarhoven, Ilya Razenshteyn, and Ludwig Schmidt. 2015. Practical and optimal LSH for angular distance. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1 (NIPS'15). MIT Press, Cambridge, MA, USA, 1225--1233."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/2746539.2746553"},{"key":"e_1_2_1_13_1","volume-title":"NHQ: An Efficient and Robust Framework for Approximate Nearest Neighbor Search with Attribute Constraint. https:\/\/github.com\/AshenOn3\/NHQ original-date: 2021-09-09T08:28:21Z.","year":"2023","unstructured":"AshenOn3. 2023. NHQ: An Efficient and Robust Framework for Approximate Nearest Neighbor Search with Attribute Constraint. https:\/\/github.com\/AshenOn3\/NHQ original-date: 2021-09-09T08:28:21Z."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.is.2019"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","unstructured":"Dmitry Baranchuk Artem Babenko and Yury Malkov. 2018. Revisiting the Inverted Indices for Billion-Scale Approximate Nearest Neighbors. https:\/\/doi.org\/10.48550\/arXiv.1802.02422 arXiv:1802.02422 [cs].","DOI":"10.48550\/arXiv.1802.02422"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/361002.361007"},{"key":"e_1_2_1_17_1","unstructured":"Erik Bernhardsson. [n. d.]. annoy: Approximate Nearest Neighbors in C\/Python optimized for memory usage and loading\/saving to disk. https:\/\/github.com\/spotify\/annoy"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/1143844.1143857"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/3447548.3467081"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/1374376.1374452"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/1963405.1963487"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/3534678.3539071"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.14778\/3303753.3303754"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2013.240"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.5555\/645925.671516"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/3543507.3583552"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.14778\/3397230.3397243"},{"key":"e_1_2_1_28_1","volume-title":"Proceedings of the 37th International Conference on Machine Learning (ICML'20","volume":"3896","author":"Guo Ruiqi","year":"2020","unstructured":"Ruiqi Guo, Philip Sun, Erik Lindgren, Quan Geng, David Simcha, Felix Chern, and Sanjiv Kumar. 2020. Accelerating large-scale inference with anisotropic vector quantization. In Proceedings of the 37th International Conference on Machine Learning (ICML'20, Vol. 119). JMLR.org, 3887--3896."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/276698.276876"},{"key":"e_1_2_1_31_1","volume-title":"Similarity Search and Applications (Lecture Notes in Computer Science), Shin'ichi Satoh, Lucia Vadicamo, Arthur Zimek, Fabio Carrara, Ilaria Bartolini, Martin Aum\u00fcller, Bj\u00f6rn \u00de\u00f3r J\u00f3nsson","author":"Jafari Omid","unstructured":"Omid Jafari, Parth Nagarkar, and Jonathan Monta\u00f1o. 2020. mmLSH: A Practical and Efficient Technique for Processing Approximate Nearest Neighbor Queries on Multimedia Data. In Similarity Search and Applications (Lecture Notes in Computer Science), Shin'ichi Satoh, Lucia Vadicamo, Arthur Zimek, Fabio Carrara, Ilaria Bartolini, Martin Aum\u00fcller, Bj\u00f6rn \u00de\u00f3r J\u00f3nsson, and Rasmus Pagh (Eds.). Springer International Publishing, Cham, 47--61. https:\/\/doi.org\/10.1007\/ 978--3-030--60936--8_4"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/5.163414"},{"key":"e_1_2_1_33_1","volume-title":"Ravishankar Krishnawamy, and Rohan Kadekodi.","author":"Subramanya Suhas Jayaram","year":"2019","unstructured":"Suhas Jayaram Subramanya, Fnu Devvrit, Harsha Vardhan Simhadri, Ravishankar Krishnawamy, and Rohan Kadekodi. 2019. DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node. In Advances in Neural Information Processing Systems, Vol. 32. Curran Associates, Inc. https:\/\/papers.nips.cc\/paper_files\/paper\/2019\/hash\/ 09853c7fb1d3f8ee67a61b6bf4a7f8e6-Abstract.html"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1007\/978--3--540--88682--2_24"},{"key":"e_1_2_1_35_1","unstructured":"Jeff Johnson Matthijs Douze and Herv\u00e9 J\u00e9gou. 2017. Billion-scale similarity search with GPUs. http:\/\/arxiv.org\/abs\/ 1702.08734 arXiv:1702.08734 [cs]."},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI"},{"key":"e_1_2_1_37_1","doi-asserted-by":"crossref","unstructured":"Vladimir Karpukhin Barlas Oguz Sewon Min Patrick Lewis Ledell Wu Sergey Edunov Danqi Chen and Wen-tau Yih. 2020. Dense Passage Retrieval for Open-Domain Question Answering. https:\/\/arxiv.org\/abs\/2004.04906v3","DOI":"10.18653\/v1\/2020.emnlp-main.550"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1111\/j.1538--4632.1969.tb00615.x"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF00977785"},{"key":"e_1_2_1_40_1","doi-asserted-by":"crossref","unstructured":"V. Lempitsky and A. Babenko. 2012. The inverted multi-index. IEEE Computer Society 3069--3076. https:\/\/doi.org\/10. 1109\/CVPR.2012.6248038 ISSN: 1063--6919.","DOI":"10.1109\/CVPR.2012.6248038"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE48307.2020.00032"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-020-00635--4"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/3447548.3467149"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1023\/B:VISI.0000029664.99615.94"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE48307.2020.00095"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.14778\/3397230.3397240"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.14778\/3137765.3137836"},{"key":"e_1_2_1_48_1","volume-title":"Approximate nearest neighbor algorithm based on navigable small world graphs. Information Systems 45 (Sept","author":"Malkov Yury","year":"2014","unstructured":"Yury Malkov, Alexander Ponomarenko, Andrey Logvinov, and Vladimir Krylov. 2014. Approximate nearest neighbor algorithm based on navigable small world graphs. Information Systems 45 (Sept. 2014), 61--68. https:\/\/doi.org\/10.1016\/ j.is.2013.10.006"},{"key":"e_1_2_1_49_1","unstructured":"Yu A. Malkov and D. A. Yashunin. 2018. Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs. http:\/\/arxiv.org\/abs\/1603.09320 arXiv:1603.09320 [cs]."},{"key":"e_1_2_1_50_1","volume-title":"Ali Mousavi, Ihab F. Ilyas, Umar Farooq Minhas, Jeffrey Pound, and Theodoros Rekatsinas.","author":"Mohoney Jason","year":"2023","unstructured":"Jason Mohoney, Anil Pacaci, Shihabur Rahman Chowdhury, Ali Mousavi, Ihab F. Ilyas, Umar Farooq Minhas, Jeffrey Pound, and Theodoros Rekatsinas. 2023. High-Throughput Vector Similarity Search in Knowledge Graphs. http: \/\/arxiv.org\/abs\/2304.01926 arXiv:2304.01926 [cs]."},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1007\/s007780200060"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.14778\/2850583.2850589"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2103.00020"},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1145\/3404835.3463242"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","unstructured":"Christoph Schuhmann Richard Vencu Romain Beaumont Robert Kaczmarczyk Clayton Mullis Aarush Katta Theo Coombes Jenia Jitsev and Aran Komatsuzaki. 2021. LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs. https:\/\/doi.org\/10.48550\/arXiv.2111.02114 arXiv:2111.02114 [cs].","DOI":"10.48550\/arXiv.2111.02114"},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2008.4587638"},{"key":"e_1_2_1_58_1","volume-title":"Suhas Jayaram Subramanya, and Jingdong Wang","author":"Simhadri Harsha Vardhan","year":"2022","unstructured":"Harsha Vardhan Simhadri, George Williams, Martin Aum\u00fcller, Matthijs Douze, Artem Babenko, Dmitry Baranchuk, Qi Chen, Lucas Hosseini, Ravishankar Krishnaswamy, Gopal Srinivasa, Suhas Jayaram Subramanya, and Jingdong Wang. 2022. Results of the NeurIPS'21 Challenge on Billion-Scale Approximate Nearest Neighbor Search. http:\/\/arxiv.org\/abs\/2205.03763 arXiv:2205.03763 [cs]."},{"key":"e_1_2_1_59_1","volume-title":"Ravishankar Krishnaswamy, and Harsha Vardhan Simhadri.","author":"Singh Aditi","year":"2021","unstructured":"Aditi Singh, Suhas Jayaram Subramanya, Ravishankar Krishnaswamy, and Harsha Vardhan Simhadri. 2021. FreshDiskANN: A Fast and Accurate Graph-Based ANN Index for Streaming Similarity Search. https:\/\/doi.org\/10. 48550\/arXiv.2105.09613 arXiv:2105.09613 [cs]."},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.14778\/2556549.2556574"},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1016\/0031--3203(80)90066--7"},{"key":"e_1_2_1_62_1","unstructured":"Andrei Vasnetsov. [n. d.]. Filtrable HNSW - Qdrant. https:\/\/qdrant.tech\/articles\/filtrable-hnsw\/"},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3457550"},{"key":"e_1_2_1_64_1","unstructured":"Mengzhao Wang Lingwei Lv Xiaoliang Xu Yuxiang Wang Qiang Yue and Jiongkang Ni. 2022. Navigable Proximity Graph-Driven Native Hybrid Queries with Structured and Unstructured Constraints. http:\/\/arxiv.org\/abs\/2203.13601 arXiv:2203.13601 [cs]."},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.14778\/3415478.3415541"},{"key":"e_1_2_1_66_1","volume-title":"HQANN: Efficient and Robust Similarity Search for Hybrid Queries with Structured and Unstructured Constraints","author":"Wu Wei","year":"2022","unstructured":"Wei Wu, Junlin He, Yu Qiao, Guoheng Fu, Li Liu, and Jin Yu. 2022. HQANN: Efficient and Robust Similarity Search for Hybrid Queries with Structured and Unstructured Constraints. http:\/\/arxiv.org\/abs\/2207.07940 arXiv:2207.07940 [cs]."},{"key":"e_1_2_1_67_1","unstructured":"Qianxi Zhang Shuotao Xu Qi Chen Guoxin Sui Jiadong Xie Zhizhen Cai Yaoqi Chen Yinxuan He Yuqing Yang Fan Yang Mao Yang and Lidong Zhou. 2023. {VBASE}: Unifying Online Vector Similarity Search and Relational Queries via Relaxed Monotonicity. 377--395. https:\/\/www.usenix.org\/conference\/osdi23\/presentation\/zhang-qianxi"},{"key":"e_1_2_1_68_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE48307.2020.00094"},{"key":"e_1_2_1_69_1","doi-asserted-by":"publisher","DOI":"10.14778\/3377369.3377374"}],"container-title":["Proceedings of the ACM on Management of Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3654923","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3654923","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,21]],"date-time":"2025-08-21T14:41:41Z","timestamp":1755787301000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3654923"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,5,29]]},"references-count":69,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2024,5,29]]}},"alternative-id":["10.1145\/3654923"],"URL":"https:\/\/doi.org\/10.1145\/3654923","relation":{},"ISSN":["2836-6573"],"issn-type":[{"value":"2836-6573","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,5,29]]}}}