{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,28]],"date-time":"2025-08-28T00:06:15Z","timestamp":1756339575109,"version":"3.44.0"},"reference-count":104,"publisher":"Association for Computing Machinery (ACM)","issue":"5","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2025,1]]},"abstract":"<jats:p>Vector databases are designed to effectively store, organize, and retrieve high-dimensional vectors, enabling faster and more accurate querying and analysis. This study highlights that the performance of cutting-edge vector databases hinges on their proficiency in managing heterogeneous data embedding and handling compound queries. The former task revolves around converting varied data types into a cohesive vector format, while the latter involves processing multimodal or single-modal queries with precise constraints. The paper advocates for evaluating these dual tasks within an integrated benchmark framework. However, state-of-the-art vector database benchmarks overlook heterogeneous data embedding and compound queries, creating a gap in evaluating vector database performance.<\/jats:p>\n          <jats:p>To address this gap, we introduce BigVectorBench, a benchmark suite designed to evaluate vector database performance. BigVectorBench contributes by defining and evaluating the embedding performance of heterogeneous data. Additionally, it abstracts compound queries, which are increasingly used in real-world applications, replacing unimodal vector searches. Our rigorous evaluations validate the two design decisions of BigVectorBench and identify performance bottlenecks of mainstream vector databases. Its source code and user manual are available from https:\/\/github.com\/BenchCouncil\/BigVectorBench.<\/jats:p>","DOI":"10.14778\/3718057.3718078","type":"journal-article","created":{"date-parts":[[2025,8,27]],"date-time":"2025-08-27T18:11:49Z","timestamp":1756318309000},"page":"1536-1550","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["BigVectorBench: Heterogeneous Data Embedding and Compound Queries are Essential in Evaluating Vector Databases"],"prefix":"10.14778","volume":"18","author":[{"given":"Guoxin","family":"Kang","sequence":"first","affiliation":[{"name":"Institute of Computing Technology, Chinese Academy of Sciences"}]},{"given":"Zhongxin","family":"Ge","sequence":"additional","affiliation":[{"name":"Institute of Computing Technology, Chinese Academy of Sciences"}]},{"given":"Jingpei","family":"Hu","sequence":"additional","affiliation":[{"name":"Institute of Computing Technology, Chinese Academy of Sciences"}]},{"given":"Xueya","family":"Zhang","sequence":"additional","affiliation":[{"name":"University of Chinese Academy of Sciences"}]},{"given":"Lei","family":"Wang","sequence":"additional","affiliation":[{"name":"Institute of Computing Technology, Chinese Academy of Sciences"}]},{"given":"Jianfeng","family":"Zhan","sequence":"additional","affiliation":[{"name":"Institute of Computing Technology, Chinese Academy of Sciences"}]}],"member":"320","published-online":{"date-parts":[[2025,8,27]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"2007. Tesseract ocr tool. https:\/\/tesseract-ocr.github.io\/"},{"key":"e_1_2_1_2_1","unstructured":"2015. Annoy. https:\/\/github.com\/spotify\/annoy"},{"key":"e_1_2_1_3_1","unstructured":"2016. Weaviate. https:\/\/github.com\/weaviate\/weaviate"},{"key":"e_1_2_1_4_1","unstructured":"2019. Milvus. https:\/\/github.com\/milvus-io\/milvus"},{"key":"e_1_2_1_5_1","unstructured":"2021. Qdrant. https:\/\/github.com\/qdrant\/qdrant"},{"key":"e_1_2_1_6_1","unstructured":"2022. Image wikipedia dataset. https:\/\/huggingface.co\/datasets\/israfelsr\/img-wikipedia-simple"},{"key":"e_1_2_1_7_1","unstructured":"2024. OpenAI embeddings. https:\/\/platform.openai.com\/docs\/guides\/embeddings"},{"key":"e_1_2_1_8_1","volume-title":"Practical and optimal LSH for angular distance. Advances in neural information processing systems 28","author":"Andoni Alexandr","year":"2015","unstructured":"Alexandr Andoni, Piotr Indyk, Thijs Laarhoven, Ilya Razenshteyn, and Ludwig Schmidt. 2015. Practical and optimal LSH for angular distance. Advances in neural information processing systems 28 (2015)."},{"key":"e_1_2_1_9_1","volume-title":"Proceedings of the forty-seventh annual ACM symposium on Theory of computing. 793\u2013801","author":"Andoni Alexandr","year":"2015","unstructured":"Alexandr Andoni and Ilya Razenshteyn. 2015. Optimal data-dependent hashing for approximate near neighbors. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing. 793\u2013801."},{"key":"e_1_2_1_10_1","volume-title":"Hd-index: Pushing the scalability-accuracy boundary for approximate knn search in high-dimensional spaces. arXiv preprint arXiv:1804.06829","author":"Arora Akhil","year":"2018","unstructured":"Akhil Arora, Sakshi Sinha, Piyush Kumar, and Arnab Bhattacharya. 2018. Hd-index: Pushing the scalability-accuracy boundary for approximate knn search in high-dimensional spaces. arXiv preprint arXiv:1804.06829 (2018)."},{"key":"e_1_2_1_11_1","volume-title":"International conference on similarity search and applications. Springer, 34\u201349","author":"Aum\u00fcller Martin","year":"2017","unstructured":"Martin Aum\u00fcller, Erik Bernhardsson, and Alexander Faithfull. 2017. ANN-benchmarks: A benchmarking tool for approximate nearest neighbor algorithms. In International conference on similarity search and applications. Springer, 34\u201349."},{"key":"e_1_2_1_12_1","doi-asserted-by":"crossref","first-page":"101374","DOI":"10.1016\/j.is.2019.02.006","article-title":"ANN-Benchmarks: A benchmarking tool for approximate nearest neighbor algorithms","volume":"87","author":"Aum\u00fcller Martin","year":"2020","unstructured":"Martin Aum\u00fcller, Erik Bernhardsson, and Alexander Faithfull. 2020. ANN-Benchmarks: A benchmarking tool for approximate nearest neighbor algorithms. Information Systems 87 (2020), 101374.","journal-title":"Information Systems"},{"key":"e_1_2_1_13_1","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2055\u20132063","author":"Babenko Artem","year":"2016","unstructured":"Artem Babenko and Victor Lempitsky. 2016. Efficient indexing of billion-scale datasets of deep descriptors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2055\u20132063."},{"key":"e_1_2_1_14_1","volume-title":"wav2vec 2.0: A framework for self-supervised learning of speech representations. Advances in neural information processing systems 33","author":"Baevski Alexei","year":"2020","unstructured":"Alexei Baevski, Yuhao Zhou, Abdelrahman Mohamed, and Michael Auli. 2020. wav2vec 2.0: A framework for self-supervised learning of speech representations. Advances in neural information processing systems 33 (2020), 12449\u201312460."},{"key":"e_1_2_1_15_1","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV). 1728\u20131738","author":"Bain Max","year":"2021","unstructured":"Max Bain, Arsha Nagrani, G\u00fcl Varol, and Andrew Zisserman. 2021. Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval. In Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV). 1728\u20131738."},{"key":"e_1_2_1_16_1","volume-title":"Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data","author":"Cappuzzo Riccardo","year":"2020","unstructured":"Riccardo Cappuzzo, Paolo Papotti, and Saravanan Thirumuruganathan. 2020. Creating Embeddings of Heterogeneous Relational Datasets for Data Integration Tasks. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (Portland, OR, USA) (SIGMOD '20). Association for Computing Machinery, New York, NY, USA, 1335\u20131349. 10.1145\/3318464.3389742"},{"key":"e_1_2_1_17_1","volume-title":"Bge m3-embedding: Multi-lingual, multi-functionality, multi-granularity text embeddings through self-knowledge distillation. arXiv preprint arXiv:2402.03216","author":"Chen Jianlv","year":"2024","unstructured":"Jianlv Chen, Shitao Xiao, Peitian Zhang, Kun Luo, Defu Lian, and Zheng Liu. 2024. Bge m3-embedding: Multi-lingual, multi-functionality, multi-granularity text embeddings through self-knowledge distillation. arXiv preprint arXiv:2402.03216 (2024)."},{"key":"e_1_2_1_18_1","volume-title":"SPTAG: A library for fast approximate nearest neighbor search. https:\/\/github.com\/Microsoft\/SPTAG","author":"Chen Qi","year":"2018","unstructured":"Qi Chen, Haidong Wang, Mingqin Li, Gang Ren, Jeffery Zhu, Jason Li, Lintao Zhang, and Jingdong Wang. 2018. SPTAG: A library for fast approximate nearest neighbor search. https:\/\/github.com\/Microsoft\/SPTAG"},{"key":"e_1_2_1_19_1","first-page":"5199","article-title":"Spann: Highly-efficient billion-scale approximate nearest neighborhood search","volume":"34","author":"Chen Qi","year":"2021","unstructured":"Qi Chen, Bing Zhao, Haidong Wang, Mingqin Li, Chuanjie Liu, Zengzhong Li, Mao Yang, and Jingdong Wang. 2021. Spann: Highly-efficient billion-scale approximate nearest neighborhood search. Advances in Neural Information Processing Systems 34 (2021), 5199\u20135212.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_1_20_1","volume-title":"European conference on computer vision. Springer, 104\u2013120","author":"Chen Yen-Chun","year":"2020","unstructured":"Yen-Chun Chen, Linjie Li, Licheng Yu, Ahmed El Kholy, Faisal Ahmed, Zhe Gan, Yu Cheng, and Jingjing Liu. 2020. Uniter: Universal image-text representation learning. In European conference on computer vision. Springer, 104\u2013120."},{"key":"e_1_2_1_21_1","first-page":"18","volume-title":"Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","volume":"2","author":"Cohan Arman","year":"2018","unstructured":"Arman Cohan, Franck Dernoncourt, Doo Soon Kim, Trung Bui, Seokhwan Kim, Walter Chang, and Nazli Goharian. 2018. A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). Association for Computational Linguistics, New Orleans, Louisiana, 615\u2013621. 10.18653\/v1\/N18-2097"},{"key":"e_1_2_1_22_1","volume-title":"Proceedings of the 2019 IEEE\/ACM International Conference on Advances in Social Networks Analysis and Mining","author":"Cui Limeng","year":"2020","unstructured":"Limeng Cui, Suhang Wang, and Dongwon Lee. 2020. SAME: sentiment-aware multi-modal embedding for detecting fake news. In Proceedings of the 2019 IEEE\/ACM International Conference on Advances in Social Networks Analysis and Mining (Vancouver, British Columbia, Canada) (ASONAM '19). Association for Computing Machinery, New York, NY, USA, 41\u201348. 10.1145\/3341161.3342894"},{"key":"e_1_2_1_23_1","volume-title":"Proceedings of the fortieth annual ACM symposium on Theory of computing. 537\u2013546","author":"Dasgupta Sanjoy","year":"2008","unstructured":"Sanjoy Dasgupta and Yoav Freund. 2008. Random projection trees and low dimensional manifolds. In Proceedings of the fortieth annual ACM symposium on Theory of computing. 537\u2013546."},{"key":"e_1_2_1_24_1","volume-title":"Conference on learning theory. PMLR, 317\u2013337","author":"Dasgupta Sanjoy","year":"2013","unstructured":"Sanjoy Dasgupta and Kaushik Sinha. 2013. Randomized partition trees for exact nearest neighbor search. In Conference on learning theory. PMLR, 317\u2013337."},{"key":"e_1_2_1_25_1","volume-title":"Proceedings of the twentieth annual symposium on Computational geometry. 253\u2013262","author":"Datar Mayur","year":"2004","unstructured":"Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S Mirrokni. 2004. Locality-sensitive hashing scheme based on p-stable distributions. In Proceedings of the twentieth annual symposium on Computational geometry. 253\u2013262."},{"key":"e_1_2_1_26_1","volume-title":"2009 IEEE conference on computer vision and pattern recognition. Ieee, 248\u2013255","author":"Deng Jia","year":"2009","unstructured":"Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. Ieee, 248\u2013255."},{"key":"e_1_2_1_27_1","volume-title":"Proceedings of the 20th international conference on World wide web. 577\u2013586","author":"Dong Wei","year":"2011","unstructured":"Wei Dong, Charikar Moses, and Kai Li. 2011. Efficient k-nearest neighbor graph construction for generic similarity measures. In Proceedings of the 20th international conference on World wide web. 577\u2013586."},{"key":"e_1_2_1_28_1","volume-title":"CLIP Itself is a Strong Fine-tuner: Achieving 85.7% and 88.0% Top-1 Accuracy with ViT-B and ViT-L on ImageNet. arXiv preprint arXiv:2212.06138","author":"Dong Xiaoyi","year":"2022","unstructured":"Xiaoyi Dong, Jianmin Bao, Ting Zhang, Dongdong Chen, Gu Shuyang, Weiming Zhang, Lu Yuan, Dong Chen, Fang Wen, and Nenghai Yu. 2022. CLIP Itself is a Strong Fine-tuner: Achieving 85.7% and 88.0% Top-1 Accuracy with ViT-B and ViT-L on ImageNet. arXiv preprint arXiv:2212.06138 (2022)."},{"key":"e_1_2_1_29_1","unstructured":"Alexey Dosovitskiy Lucas Beyer Alexander Kolesnikov Dirk Weissenborn Xiaohua Zhai Thomas Unterthiner Mostafa Dehghani Matthias Minderer Georg Heigold Sylvain Gelly et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)."},{"key":"e_1_2_1_30_1","unstructured":"Matthijs Douze Alexandr Guzhva Chengqi Deng Jeff Johnson Gergely Szilvasy Pierre-Emmanuel Mazar\u00e9 Maria Lomeli Lucas Hosseini and Herv\u00e9 J\u00e9gou. 2024. The Faiss library. (2024). arXiv:2401.08281 [cs.LG]"},{"key":"e_1_2_1_31_1","volume-title":"Fast approximate nearest neighbor search with the navigating spreading-out graph. arXiv preprint arXiv:1707.00143","author":"Fu Cong","year":"2017","unstructured":"Cong Fu, Chao Xiang, Changxu Wang, and Deng Cai. 2017. Fast approximate nearest neighbor search with the navigating spreading-out graph. arXiv preprint arXiv:1707.00143 (2017)."},{"key":"e_1_2_1_32_1","volume-title":"Retrieval-Augmented Generation for Large Language Models: A Survey. ArXiv abs\/2312.10997","author":"Gao Yunfan","year":"2023","unstructured":"Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, Qianyu Guo, Meng Wang, and Haofen Wang. 2023. Retrieval-Augmented Generation for Large Language Models: A Survey. ArXiv abs\/2312.10997 (2023). https:\/\/api.semanticscholar.org\/CorpusID:266359151"},{"key":"e_1_2_1_33_1","volume-title":"Proceedings of the IEEE conference on computer vision and pattern recognition. 2946\u20132953","author":"Ge Tiezheng","year":"2013","unstructured":"Tiezheng Ge, Kaiming He, Qifa Ke, and Jian Sun. 2013. Optimized product quantization for approximate nearest neighbor search. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2946\u20132953."},{"key":"e_1_2_1_34_1","volume-title":"Vector quantization and signal compression","author":"Gersho Allen","unstructured":"Allen Gersho and Robert M Gray. 2012. Vector quantization and signal compression. Vol. 159. Springer Science & Business Media."},{"key":"e_1_2_1_35_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 15180\u201315190","author":"Girdhar Rohit","year":"2023","unstructured":"Rohit Girdhar, Alaaeldin El-Nouby, Zhuang Liu, Mannat Singh, Kalyan Vasudev Alwala, Armand Joulin, and Ishan Misra. 2023. Imagebind: One embedding space to bind them all. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 15180\u201315190."},{"key":"e_1_2_1_36_1","volume-title":"Proceedings of the 2nd ACM SIGSOFT International Workshop on App Market Analytics","author":"Grano Giovanni","year":"2017","unstructured":"Giovanni Grano, Andrea Di Sorbo, Francesco Mercaldo, Corrado A. Visaggio, Gerardo Canfora, and Sebastiano Panichella. 2017. Android apps and user feedback: a dataset for software evolution and quality improvement. In Proceedings of the 2nd ACM SIGSOFT International Workshop on App Market Analytics (Paderborn, Germany) (WAMA 2017). Association for Computing Machinery, New York, NY, USA, 8\u201311. 10.1145\/3121264.3121266"},{"key":"e_1_2_1_37_1","doi-asserted-by":"crossref","first-page":"4","DOI":"10.1109\/MASSP.1984.1162229","article-title":"Vector quantization","volume":"1","author":"Gray Robert","year":"1984","unstructured":"Robert Gray. 1984. Vector quantization. IEEE Assp Magazine 1, 2 (1984), 4\u201329.","journal-title":"IEEE Assp Magazine"},{"key":"e_1_2_1_38_1","volume-title":"Neuhoff","author":"Gray Robert M.","year":"1998","unstructured":"Robert M. Gray and David L. Neuhoff. 1998. Quantization. IEEE transactions on information theory 44, 6 (1998), 2325\u20132383."},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.14778\/3554821.3554843"},{"key":"e_1_2_1_40_1","volume-title":"International Conference on Machine Learning. PMLR, 3887\u20133896","author":"Guo Ruiqi","year":"2020","unstructured":"Ruiqi Guo, Philip Sun, Erik Lindgren, Quan Geng, David Simcha, Felix Chern, and Sanjiv Kumar. 2020. Accelerating large-scale inference with anisotropic vector quantization. In International Conference on Machine Learning. PMLR, 3887\u20133896."},{"key":"e_1_2_1_41_1","volume-title":"Susana Guzman, Georgios Mastrapas, Saba Sturua, Bo Wang, Maximilian Werk, Nan Wang, and Han Xiao.","author":"G\u00fcnther Michael","year":"2023","unstructured":"Michael G\u00fcnther, Jackmin Ong, Isabelle Mohr, Alaeddine Abdessalem, Tanguy Abel, Mohammad Kalim Akram, Susana Guzman, Georgios Mastrapas, Saba Sturua, Bo Wang, Maximilian Werk, Nan Wang, and Han Xiao. 2023. Jina Embeddings 2: 8192-Token General-Purpose Text Embeddings for Long Documents. arXiv:2310.19923 [cs.CL]"},{"key":"e_1_2_1_42_1","volume-title":"Proceedings of the 15th International Symposium of Information Science","author":"Hamborg Felix","year":"2017","unstructured":"Felix Hamborg, Norman Meuschke, Corinna Breitinger, and Bela Gipp. 2017. news-please: A Generic News Crawler and Extractor. In Proceedings of the 15th International Symposium of Information Science (Berlin). 218\u2013223. 10.5281\/zenodo.4120316"},{"key":"e_1_2_1_43_1","unstructured":"Awni Hannun Carl Case Jared Casper Bryan Catanzaro Greg Diamos Erich Elsen Ryan Prenger Sanjeev Satheesh Shubho Sengupta Adam Coates et al. 2014. Deep speech: Scaling up end-to-end speech recognition. arXiv preprint arXiv:1412.5567 (2014)."},{"key":"e_1_2_1_44_1","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5713\u20135722","author":"Harwood Ben","year":"2016","unstructured":"Ben Harwood and Tom Drummond. 2016. Fanng: Fast approximate nearest neighbour graphs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5713\u20135722."},{"key":"e_1_2_1_45_1","volume-title":"Proceedings of the IEEE conference on computer vision and pattern recognition. 770\u2013778","author":"He Kaiming","year":"2016","unstructured":"Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770\u2013778."},{"key":"e_1_2_1_46_1","volume-title":"Rank-based similarity search: Reducing the dimensional dependence","author":"Houle Michael E","year":"2014","unstructured":"Michael E Houle and Michael Nett. 2014. Rank-based similarity search: Reducing the dimensional dependence. IEEE transactions on pattern analysis and machine intelligence 37, 1 (2014), 136\u2013150."},{"key":"e_1_2_1_47_1","volume-title":"Proceedings of the thirtieth annual ACM symposium on Theory of computing. 604\u2013613","author":"Indyk Piotr","year":"1998","unstructured":"Piotr Indyk and Rajeev Motwani. 1998. Approximate nearest neighbors: towards removing the curse of dimensionality. In Proceedings of the thirtieth annual ACM symposium on Theory of computing. 604\u2013613."},{"key":"e_1_2_1_48_1","volume-title":"Few-shot Learning with Retrieval Augmented Language Models. ArXiv abs\/2208.03299","author":"Izacard Gautier","year":"2022","unstructured":"Gautier Izacard, Patrick Lewis, Maria Lomeli, Lucas Hosseini, Fabio Petroni, Timo Schick, Jane A. Yu, Armand Joulin, Sebastian Riedel, and Edouard Grave. 2022. Few-shot Learning with Retrieval Augmented Language Models. ArXiv abs\/2208.03299 (2022). https:\/\/api.semanticscholar.org\/CorpusID:251371732"},{"key":"e_1_2_1_49_1","volume-title":"Ravishankar Krishnawamy, and Rohan Kadekodi.","author":"Subramanya Suhas Jayaram","year":"2019","unstructured":"Suhas Jayaram Subramanya, Fnu Devvrit, Harsha Vardhan Simhadri, Ravishankar Krishnawamy, and Rohan Kadekodi. 2019. Diskann: Fast accurate billion-point nearest neighbor search on a single node. Advances in Neural Information Processing Systems 32 (2019)."},{"key":"e_1_2_1_50_1","volume-title":"Product quantization for nearest neighbor search","author":"Jegou Herve","year":"2010","unstructured":"Herve Jegou, Matthijs Douze, and Cordelia Schmid. 2010. Product quantization for nearest neighbor search. IEEE transactions on pattern analysis and machine intelligence 33, 1 (2010), 117\u2013128."},{"key":"e_1_2_1_51_1","doi-asserted-by":"crossref","first-page":"535","DOI":"10.1109\/TBDATA.2019.2921572","article-title":"Billion-scale similarity search with GPUs","volume":"7","author":"Johnson Jeff","year":"2019","unstructured":"Jeff Johnson, Matthijs Douze, and Herv\u00e9 J\u00e9gou. 2019. Billion-scale similarity search with GPUs. IEEE Transactions on Big Data 7, 3 (2019), 535\u2013547.","journal-title":"IEEE Transactions on Big Data"},{"key":"e_1_2_1_52_1","volume-title":"Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies. 1490\u20131500","author":"Komninos Alexandros","year":"2016","unstructured":"Alexandros Komninos and Suresh Manandhar. 2016. Dependency based embeddings for sentence classification tasks. In Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies. 1490\u20131500."},{"key":"e_1_2_1_53_1","volume-title":"Vector Databases and Vector Embeddings-Review. In 2023 International Workshop on Artificial Intelligence and Image Processing (IWAIIP). 231\u2013236","author":"Kukreja Sanjay","year":"2023","unstructured":"Sanjay Kukreja, Tarun Kumar, Vishal Bharate, Amit Purohit, Abhijit Dasgupta, and Debashis Guha. 2023. Vector Databases and Vector Embeddings-Review. In 2023 International Workshop on Artificial Intelligence and Image Processing (IWAIIP). 231\u2013236. 10.1109\/IWAIIP58158.2023.10462847"},{"key":"e_1_2_1_54_1","volume-title":"Proceedings of the 34th International Conference on Neural Information Processing Systems","author":"Lewis Patrick","year":"2020","unstructured":"Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich K\u00fcttler, Mike Lewis, Wen-tau Yih, Tim Rockt\u00e4schel, Sebastian Riedel, and Douwe Kiela. 2020. Retrieval-augmented generation for knowledge-intensive NLP tasks. In Proceedings of the 34th International Conference on Neural Information Processing Systems (Vancouver, BC, Canada) (NIPS '20). Curran Associates Inc., Red Hook, NY, USA, Article 793, 16 pages."},{"key":"e_1_2_1_55_1","first-page":"1475","article-title":"Approximate nearest neighbor search on high dimensional data\u2014experiments, analyses, and improvement","volume":"32","author":"Li Wen","year":"2019","unstructured":"Wen Li, Ying Zhang, Yifang Sun, Wei Wang, Mingjie Li, Wenjie Zhang, and Xuemin Lin. 2019. Approximate nearest neighbor search on high dimensional data\u2014experiments, analyses, and improvement. IEEE Transactions on Knowledge and Data Engineering 32, 8 (2019), 1475\u20131488.","journal-title":"IEEE Transactions on Knowledge and Data Engineering"},{"key":"e_1_2_1_56_1","volume-title":"Proceedings, Part XXX 16","author":"Li Xiujun","year":"2020","unstructured":"Xiujun Li, Xi Yin, Chunyuan Li, Pengchuan Zhang, Xiaowei Hu, Lei Zhang, Lijuan Wang, Houdong Hu, Li Dong, Furu Wei, et al. 2020. Oscar: Object-semantics aligned pre-training for vision-language tasks. In Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, August 23\u201328, 2020, Proceedings, Part XXX 16. Springer, 121\u2013137."},{"key":"e_1_2_1_57_1","volume-title":"Towards general text embeddings with multi-stage contrastive learning. arXiv preprint arXiv:2308.03281","author":"Li Zehan","year":"2023","unstructured":"Zehan Li, Xin Zhang, Yanzhao Zhang, Dingkun Long, Pengjun Xie, and Meishan Zhang. 2023. Towards general text embeddings with multi-stage contrastive learning. arXiv preprint arXiv:2308.03281 (2023)."},{"key":"e_1_2_1_58_1","volume-title":"Least squares quantization in PCM","author":"Lloyd Stuart","year":"1982","unstructured":"Stuart Lloyd. 1982. Least squares quantization in PCM. IEEE transactions on information theory 28, 2 (1982), 129\u2013137."},{"key":"e_1_2_1_59_1","volume-title":"Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. Advances in neural information processing systems 32","author":"Lu Jiasen","year":"2019","unstructured":"Jiasen Lu, Dhruv Batra, Devi Parikh, and Stefan Lee. 2019. Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. Advances in neural information processing systems 32 (2019)."},{"key":"e_1_2_1_60_1","volume-title":"Proceedings of the IEEE\/CVF international conference on computer vision. 5512\u20135521","author":"Luo Chenxu","year":"2019","unstructured":"Chenxu Luo and Alan L Yuille. 2019. Grouped spatial-temporal aggregation for efficient action recognition. In Proceedings of the IEEE\/CVF international conference on computer vision. 5512\u20135521."},{"key":"e_1_2_1_61_1","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1016\/j.is.2013.10.006","article-title":"Approximate nearest neighbor algorithm based on navigable small world graphs","volume":"45","author":"Malkov Yury","year":"2014","unstructured":"Yury Malkov, Alexander Ponomarenko, Andrey Logvinov, and Vladimir Krylov. 2014. Approximate nearest neighbor algorithm based on navigable small world graphs. Information Systems 45 (2014), 61\u201368.","journal-title":"Information Systems"},{"key":"e_1_2_1_62_1","volume-title":"Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs","author":"Malkov Yu A","year":"2018","unstructured":"Yu A Malkov and Dmitry A Yashunin. 2018. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE transactions on pattern analysis and machine intelligence 42, 4 (2018), 824\u2013836."},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1021\/acs.jcim.9b00266"},{"key":"e_1_2_1_64_1","volume-title":"Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781","author":"Mikolov Tomas","year":"2013","unstructured":"Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)."},{"key":"e_1_2_1_65_1","volume-title":"Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems 26","author":"Mikolov Tomas","year":"2013","unstructured":"Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems 26 (2013)."},{"key":"e_1_2_1_66_1","volume-title":"MTEB: Massive text embedding benchmark. arXiv preprint arXiv:2210.07316","author":"Muennighoff Niklas","year":"2022","unstructured":"Niklas Muennighoff, Nouamane Tazi, Lo\u00efc Magne, and Nils Reimers. 2022. MTEB: Massive text embedding benchmark. arXiv preprint arXiv:2210.07316 (2022)."},{"key":"e_1_2_1_67_1","volume-title":"Flann-fast library for approximate nearest neighbors user manual. Computer Science Department","author":"Muja Marius","year":"2009","unstructured":"Marius Muja and David Lowe. 2009. Flann-fast library for approximate nearest neighbors user manual. Computer Science Department, University of British Columbia, Vancouver, BC, Canada 5, 6 (2009)."},{"key":"e_1_2_1_68_1","volume-title":"Scalable nearest neighbor algorithms for high dimensional data","author":"Muja Marius","year":"2014","unstructured":"Marius Muja and David G Lowe. 2014. Scalable nearest neighbor algorithms for high dimensional data. IEEE transactions on pattern analysis and machine intelligence 36, 11 (2014), 2227\u20132240."},{"key":"e_1_2_1_69_1","volume-title":"Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP). 188\u2013197","author":"Ni Jianmo","year":"2019","unstructured":"Jianmo Ni, Jiacheng Li, and Julian McAuley. 2019. Justifying recommendations using distantly-labeled reviews and fine-grained aspects. In Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP). 188\u2013197."},{"key":"e_1_2_1_70_1","volume-title":"Nomic Embed: Training a Reproducible Long Context Text Embedder. arXiv:2402.01613 [cs.CL]","author":"Nussbaum Zach","year":"2024","unstructured":"Zach Nussbaum, John X. Morris, Brandon Duderstadt, and Andriy Mulyar. 2024. Nomic Embed: Training a Reproducible Long Context Text Embedder. arXiv:2402.01613 [cs.CL]"},{"key":"e_1_2_1_71_1","volume-title":"Proceedings of the 2024 Conference on Human Information Interaction and Retrieval","author":"Odede Julius","year":"2024","unstructured":"Julius Odede and Ingo Frommholz. 2024. JayBot - Aiding University Students and Admission with an LLM-based Chatbot. In Proceedings of the 2024 Conference on Human Information Interaction and Retrieval (Sheffield, United Kingdom) (CHIIR '24). Association for Computing Machinery, New York, NY, USA, 391\u2013395. 10.1145\/3627508.3638293"},{"key":"e_1_2_1_72_1","doi-asserted-by":"crossref","first-page":"1591","DOI":"10.1007\/s00778-024-00864-x","article-title":"Survey of vector database management systems","volume":"33","author":"Pan James Jie","year":"2024","unstructured":"James Jie Pan, Jianguo Wang, and Guoliang Li. 2024. Survey of vector database management systems. The VLDB Journal 33, 5 (2024), 1591\u20131615.","journal-title":"The VLDB Journal"},{"key":"e_1_2_1_73_1","volume-title":"Vector Database Management Techniques and Systems. In Companion of the 2024 International Conference on Management of Data. 597\u2013604","author":"Pan James Jie","year":"2024","unstructured":"James Jie Pan, Jianguo Wang, and Guoliang Li. 2024. Vector Database Management Techniques and Systems. In Companion of the 2024 International Conference on Management of Data. 597\u2013604."},{"key":"e_1_2_1_74_1","volume-title":"Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on. IEEE, 5206\u20135210","author":"Panayotov Vassil","year":"2015","unstructured":"Vassil Panayotov, Guoguo Chen, Daniel Povey, and Sanjeev Khudanpur. 2015. Librispeech: an ASR corpus based on public domain audio books. In Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on. IEEE, 5206\u20135210."},{"key":"e_1_2_1_75_1","volume-title":"Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 1532\u20131543","author":"Pennington Jeffrey","year":"2014","unstructured":"Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 1532\u20131543."},{"key":"e_1_2_1_76_1","volume-title":"International conference on machine learning. PMLR, 8748\u20138763","author":"Radford Alec","year":"2021","unstructured":"Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. In International conference on machine learning. PMLR, 8748\u20138763."},{"key":"e_1_2_1_77_1","volume-title":"Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever.","author":"Radford Alec","year":"2021","unstructured":"Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. arXiv:2103.00020 [cs.CV] https:\/\/arxiv.org\/abs\/2103.00020"},{"key":"e_1_2_1_78_1","first-page":"18","volume-title":"Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Iryna Gurevych and Yusuke Miyao (Eds.). Association for Computational Linguistics","author":"Rajpurkar Pranav","year":"2018","unstructured":"Pranav Rajpurkar, Robin Jia, and Percy Liang. 2018. Know What You Don't Know: Unanswerable Questions for SQuAD. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Iryna Gurevych and Yusuke Miyao (Eds.). Association for Computational Linguistics, Melbourne, Australia, 784\u2013789. arXiv:1806.03822 [cs.CL] 10.18653\/v1\/P18-2124"},{"key":"e_1_2_1_79_1","first-page":"16","volume-title":"Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Jian Su, Kevin Duh, and Xavier Carreras (Eds.). Association for Computational Linguistics","author":"Rajpurkar Pranav","year":"2016","unstructured":"Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. SQuAD: 100,000+ Questions for Machine Comprehension of Text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Jian Su, Kevin Duh, and Xavier Carreras (Eds.). Association for Computational Linguistics, Austin, Texas, 2383\u20132392. arXiv:1606.05250 [cs.CL] 10.18653\/v1\/D16-1264"},{"key":"e_1_2_1_80_1","volume-title":"Proceedings of the 25th acm sigkdd international conference on knowledge discovery & data mining. 1378\u20131388","author":"Ram Parikshit","year":"2019","unstructured":"Parikshit Ram and Kaushik Sinha. 2019. Revisiting kd-tree for nearest neighbor search. In Proceedings of the 25th acm sigkdd international conference on knowledge discovery & data mining. 1378\u20131388."},{"key":"e_1_2_1_81_1","volume-title":"wav2vec: Unsupervised pre-training for speech recognition. arXiv preprint arXiv:1904.05862","author":"Schneider Steffen","year":"2019","unstructured":"Steffen Schneider, Alexei Baevski, Ronan Collobert, and Michael Auli. 2019. wav2vec: Unsupervised pre-training for speech recognition. arXiv preprint arXiv:1904.05862 (2019)."},{"key":"e_1_2_1_82_1","unstructured":"Christoph Schuhmann and Peter Bevan. 2023. https:\/\/huggingface.co\/datasets\/laion\/220k-GPT4Vision-captions-from-LIVIS."},{"key":"e_1_2_1_83_1","volume-title":"2008 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1\u20138.","author":"Silpa-Anan Chanop","year":"2008","unstructured":"Chanop Silpa-Anan and Richard Hartley. 2008. Optimised KD-trees for fast image descriptor matching. In 2008 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1\u20138."},{"key":"e_1_2_1_84_1","unstructured":"Harsha Vardhan Simhadri. 2023. big-ann-benchmarks: Framework for evaluating ANNS algorithms on billion scale datasets. https:\/\/github.com\/harsha-simhadri\/big-ann-benchmarks"},{"key":"e_1_2_1_85_1","volume-title":"Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556","author":"Simonyan Karen","year":"2014","unstructured":"Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)."},{"key":"e_1_2_1_86_1","volume-title":"Lxmert: Learning cross-modality encoder representations from transformers. arXiv preprint arXiv:1908.07490","author":"Tan Hao","year":"2019","unstructured":"Hao Tan and Mohit Bansal. 2019. Lxmert: Learning cross-modality encoder representations from transformers. arXiv preprint arXiv:1908.07490 (2019)."},{"key":"e_1_2_1_87_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.compbiomed.2023.107535"},{"key":"e_1_2_1_88_1","unstructured":"Nandan Thakur Nils Reimers Andreas R\u00fcckl\u00e9 Abhishek Srivastava and Iryna Gurevych. 2021. BEIR: A Heterogeneous Benchmark for Zero-shot Evaluation of Information Retrieval Models. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2). https:\/\/openreview.net\/forum?id=wCu6T5xFjeJ"},{"key":"e_1_2_1_89_1","volume-title":"Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499 12","author":"Den Oord Aaron Van","year":"2016","unstructured":"Aaron Van Den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, Koray Kavukcuoglu, et al. 2016. Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499 12 (2016)."},{"key":"e_1_2_1_90_1","volume-title":"Proceedings of the 2021 5th International Conference on Computer Science and Artificial Intelligence","author":"Wang Chenxi","year":"2022","unstructured":"Chenxi Wang and Xudong Luo. 2022. A Legal Question Answering System Based on BERT. In Proceedings of the 2021 5th International Conference on Computer Science and Artificial Intelligence (Beijing, China) (CSAI '21). Association for Computing Machinery, New York, NY, USA, 278\u2013283. 10.1145\/3507548.3507591"},{"key":"e_1_2_1_91_1","volume-title":"Proceedings of the 2020 Conference of the Asian Chapter of the Association for Computational Linguistics (AACL): System Demonstrations.","author":"Wang Changhan","year":"2020","unstructured":"Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, and Juan Pino. 2020. fairseq S2T: Fast Speech-to-Text Modeling with fairseq. In Proceedings of the 2020 Conference of the Asian Chapter of the Association for Computational Linguistics (AACL): System Demonstrations."},{"key":"e_1_2_1_92_1","volume-title":"2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1106\u20131113","author":"Wang Jing","year":"2012","unstructured":"Jing Wang, Jingdong Wang, Gang Zeng, Zhuowen Tu, Rui Gan, and Shipeng Li. 2012. Scalable k-nn graph construction for visual descriptors. In 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1106\u20131113."},{"key":"e_1_2_1_93_1","volume-title":"Proceedings of the 2021 International Conference on Management of Data. 2614\u20132627","author":"Wang Jianguo","year":"2021","unstructured":"Jianguo Wang, Xiaomeng Yi, Rentong Guo, Hai Jin, Peng Xu, Shengjun Li, Xiangyu Wang, Xiangzhou Guo, Chengming Li, Xiaohai Xu, et al. 2021. Milvus: A purpose-built vector data management system. In Proceedings of the 2021 International Conference on Management of Data. 2614\u20132627."},{"key":"e_1_2_1_94_1","volume-title":"Temporal segment networks for action recognition in videos","author":"Wang Limin","year":"2018","unstructured":"Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, and Luc Van Gool. 2018. Temporal segment networks for action recognition in videos. IEEE transactions on pattern analysis and machine intelligence 41, 11 (2018), 2740\u20132755."},{"key":"e_1_2_1_95_1","volume-title":"Text Embeddings by Weakly-Supervised Contrastive Pre-training. arXiv preprint arXiv:2212.03533","author":"Wang Liang","year":"2022","unstructured":"Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, and Furu Wei. 2022. Text Embeddings by Weakly-Supervised Contrastive Pre-training. arXiv preprint arXiv:2212.03533 (2022)."},{"key":"e_1_2_1_96_1","volume-title":"Improving Text Embeddings with Large Language Models. arXiv preprint arXiv:2401.00368","author":"Wang Liang","year":"2023","unstructured":"Liang Wang, Nan Yang, Xiaolong Huang, Linjun Yang, Rangan Majumder, and Furu Wei. 2023. Improving Text Embeddings with Large Language Models. arXiv preprint arXiv:2401.00368 (2023)."},{"key":"e_1_2_1_97_1","unstructured":"Liang Wang Nan Yang Xiaolong Huang Linjun Yang Rangan Majumder and Furu Wei. 2024. Multilingual E5 Text Embeddings: A Technical Report. arXiv preprint arXiv:2402.05672 (2024)."},{"key":"e_1_2_1_98_1","volume-title":"Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval","author":"Wang Wenping","year":"2023","unstructured":"Wenping Wang, Yunxi Guo, Chiyao Shen, Shuai Ding, Guangdeng Liao, Hao Fu, and Pramodh Karanth Prabhakar. 2023. Integrity and Junkiness Failure Handling for Embedding-based Retrieval: A Case Study in Social Network Search. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (Taipei, Taiwan) (SIGIR '23). Association for Computing Machinery, New York, NY, USA, 3250\u20133254. 10.1145\/3539618.3591831"},{"key":"e_1_2_1_99_1","volume-title":"Proceedings of the 1st International Conference on Big Data Technologies","author":"Xie Hang","unstructured":"Hang Xie and Tiffany Y. Tang. 2018. Vector projection on lyrics and user comments for a lightweight emotion-aware chinese music recommendation system. In Proceedings of the 1st International Conference on Big Data Technologies (Hangzhou, China) (ICBDT '18). Association for Computing Machinery, New York, NY, USA, 88\u201394. 10.1145\/3226116.3226132"},{"key":"e_1_2_1_100_1","volume-title":"Proceedings of the 40th International Conference on Machine Learning","author":"Yasunaga Michihiro","year":"2023","unstructured":"Michihiro Yasunaga, Armen Aghajanyan, Weijia Shi, Rich James, Jure Leskovec, Percy Liang, Mike Lewis, Luke Zettlemoyer, and Wen-tau Yih. 2023. Retrieval-augmented multimodal language modeling. In Proceedings of the 40th International Conference on Machine Learning (Honolulu, Hawaii, USA) (ICML'23). JMLR.org, Article 1659, 15 pages."},{"key":"e_1_2_1_101_1","first-page":"311","article-title":"Data structures and algorithms for nearest neighbor search in general metric spaces","volume":"93","author":"Yianilos Peter N","year":"1993","unstructured":"Peter N Yianilos. 1993. Data structures and algorithms for nearest neighbor search in general metric spaces. In Soda, Vol. 93. 311\u201321.","journal-title":"Soda"},{"key":"e_1_2_1_102_1","doi-asserted-by":"crossref","first-page":"100162","DOI":"10.1016\/j.tbench.2024.100162","article-title":"Evaluatology: The science and engineering of evaluation","volume":"4","author":"Zhan Jianfeng","year":"2024","unstructured":"Jianfeng Zhan, Lei Wang, Wanling Gao, Hongxiao Li, Chenxi Wang, Yunyou Huang, Yatao Li, Zhengxin Yang, Guoxin Kang, Chunjie Luo, et al. 2024. Evaluatology: The science and engineering of evaluation. BenchCouncil Transactions on Benchmarks, Standards and Evaluations 4, 1 (2024), 100162.","journal-title":"BenchCouncil Transactions on Benchmarks, Standards and Evaluations"},{"key":"e_1_2_1_103_1","volume-title":"Junbo Jake Zhao, and Yann LeCun","author":"Zhang Xiang","year":"2015","unstructured":"Xiang Zhang, Junbo Jake Zhao, and Yann LeCun. 2015. Character-level Convolutional Networks for Text Classification. In NIPS."},{"key":"e_1_2_1_104_1","doi-asserted-by":"crossref","first-page":"4481","DOI":"10.14778\/3685800.3685905","article-title":"Chat2Data: An Interactive Data Analysis System with RAG, Vector Databases and LLMs","volume":"17","author":"Zhao Xinyang","year":"2024","unstructured":"Xinyang Zhao, Xuanhe Zhou, and Guoliang Li. 2024. Chat2Data: An Interactive Data Analysis System with RAG, Vector Databases and LLMs. Proceedings of the VLDB Endowment 17, 12 (2024), 4481\u20134484.","journal-title":"Proceedings of the VLDB Endowment"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3718057.3718078","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,27]],"date-time":"2025-08-27T18:13:09Z","timestamp":1756318389000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3718057.3718078"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,1]]},"references-count":104,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2025,1]]}},"alternative-id":["10.14778\/3718057.3718078"],"URL":"https:\/\/doi.org\/10.14778\/3718057.3718078","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2025,1]]},"assertion":[{"value":"2025-08-27","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}