{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,11]],"date-time":"2025-09-11T18:08:34Z","timestamp":1757614114169,"version":"3.44.0"},"reference-count":80,"publisher":"Association for Computing Machinery (ACM)","issue":"11","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2025,7]]},"abstract":"<jats:p>The rapid advancements of generative artificial intelligence (GenAI) have recently led to renewed attention towards approximate nearest neighbor (ANN) search and vector databases (VectorDB). Among various ANN methodologies, vector quantization techniques like product quantization (PQ) are widely used to generate space-efficient representations for large-scale dense vectors. However, the code-books generated by PQ often reach several gigabytes in size, making them impractical for web-scale, high-dimensional vectors in resource-constrained environments like mobile devices.<\/jats:p>\n          <jats:p>\n            In this study, we propose\n            <jats:bold>SegPQ<\/jats:bold>\n            , a simple yet effective framework for losslessly compressing codebooks generated by\n            <jats:bold>any<\/jats:bold>\n            PQ variants, enabling efficient\n            <jats:bold>in-memory<\/jats:bold>\n            vector search on devices with limited memory. SegPQ represents the raw PQ codewords as a trained error-bounded piecewise linear approximation model (\u03f5-PLA) and pre-computed low-bit residuals. We theoretically demonstrate that, with high probability, the number of bits per compressed codeword is 1.721 + \u2308log\n            <jats:sub>2<\/jats:sub>\n            \u03f5\n            <jats:sup>OPT<\/jats:sup>\n            \u2309, where \u03f5\n            <jats:sup>OPT<\/jats:sup>\n            is the optimal error parameter that can be determined by data characteristics. To accelerate query execution, we further design SIMD-aware query processing algorithms on compressed codebooks to fully exploit the hardware parallelism offered by modern architectures. Extensive experimental studies on real datasets showcase that, for\n            <jats:bold>1 billion<\/jats:bold>\n            vectors, SegPQ reduces PQ codebook memory consumption by up to\n            <jats:bold>4.7<\/jats:bold>\n            x (approx.\n            <jats:bold>851 MB<\/jats:bold>\n            ) while incurring only\n            <jats:bold>3.3%<\/jats:bold>\n            additional query processing overhead caused by decompression.\n          <\/jats:p>","DOI":"10.14778\/3749646.3749650","type":"journal-article","created":{"date-parts":[[2025,9,4]],"date-time":"2025-09-04T17:55:06Z","timestamp":1757008506000},"page":"3730-3743","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Not Small Enough? SegPQ: A Learned Approach to Compress Product Quantization Codebooks"],"prefix":"10.14778","volume":"18","author":[{"given":"Qiyu","family":"Liu","sequence":"first","affiliation":[{"name":"Southwest University"}]},{"given":"Yanlin","family":"Qi","sequence":"additional","affiliation":[{"name":"HIT Shenzhen"}]},{"given":"Siyuan","family":"Han","sequence":"additional","affiliation":[{"name":"HKUST"}]},{"given":"Jingshu","family":"Peng","sequence":"additional","affiliation":[{"name":"ByteDance"}]},{"given":"Jin","family":"Li","sequence":"additional","affiliation":[{"name":"Harvard University"}]},{"given":"Lei","family":"Chen","sequence":"additional","affiliation":[{"name":"HKUST &amp; HKUST (GZ)"}]}],"member":"320","published-online":{"date-parts":[[2025,9,4]]},"reference":[{"volume-title":"Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions","author":"Andoni Alexandr","key":"e_1_2_1_1_1","unstructured":"Alexandr Andoni and Piotr Indyk. 2006. Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions. In FOCS. IEEE Computer Society, 459\u2013468."},{"key":"e_1_2_1_2_1","volume-title":"42nd International Conference on Very Large Data Bases","volume":"9","author":"Andr\u00e9 Fabien","year":"2016","unstructured":"Fabien Andr\u00e9, Anne-Marie Kermarrec, and Nicolas Le Scouarnec. 2016. Cache locality is not enough: High-performance nearest neighbor search with product quantization fast scan. In 42nd International Conference on Very Large Data Bases, Vol. 9. 12."},{"key":"e_1_2_1_3_1","unstructured":"apple-flm [n.d.]. Apple Intelligence Foundation Language Models. https:\/\/machinelearning.apple.com\/papers\/apple_intelligence_foundation_language_models.pdf. Accessed: 2024-07-31."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/293347.293348"},{"volume-title":"ACL (tutorial)","author":"Asai Akari","key":"e_1_2_1_5_1","unstructured":"Akari Asai, Sewon Min, Zexuan Zhong, and Danqi Chen. 2023. Retrieval-based Language Models and Applications. In ACL (tutorial). Association for Computational Linguistics, 41\u201346."},{"key":"e_1_2_1_6_1","volume-title":"ANN-Benchmarks: A benchmarking tool for approximate nearest neighbor algorithms. Inf. Syst. 87","author":"Aum\u00fcller Martin","year":"2020","unstructured":"Martin Aum\u00fcller, Erik Bernhardsson, and Alexander John Faithfull. 2020. ANN-Benchmarks: A benchmarking tool for approximate nearest neighbor algorithms. Inf. Syst. 87 (2020)."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.124"},{"key":"e_1_2_1_8_1","volume-title":"Lempitsky","author":"Babenko Artem","year":"2016","unstructured":"Artem Babenko and Victor S. Lempitsky. 2016. Efficient Indexing of Billion-Scale Datasets of Deep Descriptors. In CVPR. IEEE Computer Society, 2055\u20132063."},{"key":"e_1_2_1_9_1","unstructured":"Payal Bajaj Daniel Campos Nick Craswell Li Deng Jianfeng Gao Xiaodong Liu Rangan Majumder Andrew McNamara Bhaskar Mitra Tri Nguyen et al. 2016. Ms marco: A human generated machine reading comprehension dataset. arXiv preprint arXiv:1611.09268 (2016)."},{"key":"e_1_2_1_10_1","volume-title":"Guttag","author":"Blalock Davis W.","year":"2017","unstructured":"Davis W. Blalock and John V. Guttag. 2017. Bolt: Accelerated Data Mining with Fast Vector Compression. In KDD. ACM, 727\u2013735."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/3524060"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3524060"},{"key":"e_1_2_1_13_1","volume-title":"Distance-Based Indexing for High-Dimensional Metric Spaces. In SIGMOD Conference. ACM Press, 357\u2013368","author":"Bozkaya Tolga","year":"1997","unstructured":"Tolga Bozkaya and Z. Meral \u00d6zsoyoglu. 1997. Distance-Based Indexing for High-Dimensional Metric Spaces. In SIGMOD Conference. ACM Press, 357\u2013368."},{"key":"e_1_2_1_14_1","unstructured":"Tom B. Brown Benjamin Mann Nick Ryder et al. 2020. Language Models are Few-Shot Learners. In NeurIPS."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11280-024-01276-1"},{"key":"e_1_2_1_16_1","article-title":"PaLM: Scaling Language Modeling with Pathways","volume":"24","author":"Chowdhery Aakanksha","year":"2023","unstructured":"Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, et al. 2023. PaLM: Scaling Language Modeling with Pathways. J. Mach. Learn. Res. 24 (2023), 240:1\u2013240:113.","journal-title":"J. Mach. Learn. Res."},{"key":"e_1_2_1_17_1","unstructured":"cohere [n.d.]. Cohere. https:\/\/huggingface.co\/Cohere. Accessed: 2024-11-12."},{"key":"e_1_2_1_18_1","volume-title":"Mirrokni","author":"Datar Mayur","year":"2004","unstructured":"Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S. Mirrokni. 2004. Locality-sensitive hashing scheme based on p-stable distributions. In SCG. ACM, 253\u2013262."},{"volume-title":"Order statistics","author":"David Herbert A","key":"e_1_2_1_19_1","unstructured":"Herbert A David and Haikady N Nagaraja. 2004. Order statistics. John Wiley & Sons."},{"key":"e_1_2_1_20_1","volume-title":"Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies","volume":"1","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers). 4171\u20134186."},{"key":"e_1_2_1_21_1","volume-title":"ALEX: An Updatable Adaptive Learned Index. In SIGMOD Conference. ACM, 969\u2013984","author":"Ding Jialin","year":"2020","unstructured":"Jialin Ding, Umar Farooq Minhas, Jia Yu, Chi Wang, Jaeyoung Do, Yinan Li, Hantian Zhang, Badrish Chandramouli, Johannes Gehrke, Donald Kossmann, David B. Lomet, and Tim Kraska. 2020. ALEX: An Updatable Adaptive Learned Index. In SIGMOD Conference. ACM, 969\u2013984."},{"key":"e_1_2_1_22_1","unstructured":"faiss [n.d.]. Faiss. https:\/\/faiss.ai\/. Accessed: 2024-06-12."},{"key":"e_1_2_1_23_1","volume-title":"ICML (Proceedings of Machine Learning Research)","volume":"119","author":"Ferragina Paolo","year":"2020","unstructured":"Paolo Ferragina, Fabrizio Lillo, and Giorgio Vinciguerra. 2020. Why Are Learned Indexes So Effective?. In ICML (Proceedings of Machine Learning Research), Vol. 119. PMLR, 3123\u20133132."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.14778\/3389133.3389135"},{"key":"e_1_2_1_25_1","volume-title":"Retrieval-Augmented Generation for Large Language Models: A Survey. CoRR abs\/2312.10997","author":"Gao Yunfan","year":"2023","unstructured":"Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, Qianyu Guo, Meng Wang, and Haofen Wang. 2023. Retrieval-Augmented Generation for Large Language Models: A Survey. CoRR abs\/2312.10997 (2023)."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2013.240"},{"volume-title":"Safer","year":"2024","key":"e_1_2_1_27_1","unstructured":"gemma2b [n.d.]. Smaller, Safer, More Transparent: Advancing Responsible AI with Gemma. https:\/\/developers.googleblog.com\/en\/smaller-safer-more-transparent-advancing-responsible-ai-with-gemma\/. Accessed: 2024-08-02."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2010.57"},{"volume-title":"Searching in one billion vectors: Re-rank with source coding","author":"J\u00e9gou Herv\u00e9","key":"e_1_2_1_29_1","unstructured":"Herv\u00e9 J\u00e9gou, Romain Tavenard, Matthijs Douze, and Laurent Amsaleg. 2011. Searching in one billion vectors: Re-rank with source coding. In ICASSP. IEEE, 861\u2013864."},{"key":"e_1_2_1_30_1","volume-title":"Survey of Hallucination in Natural Language Generation. ACM Comput. Surv. 55, 12","author":"Ji Ziwei","year":"2023","unstructured":"Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Yejin Bang, Andrea Madotto, and Pascale Fung. 2023. Survey of Hallucination in Natural Language Generation. ACM Comput. Surv. 55, 12 (2023), 248:1\u2013248:38."},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1080\/01621459.2012.737745"},{"key":"e_1_2_1_32_1","volume-title":"The Case for Learned Index Structures. In SIGMOD Conference. ACM, 489\u2013504","author":"Kraska Tim","year":"2018","unstructured":"Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, and Neoklis Polyzotis. 2018. The Case for Learned Index Structures. In SIGMOD Conference. ACM, 489\u2013504."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/276698.276877"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00276"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1612"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2019.2909204"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.14778\/3476249.3476305"},{"key":"e_1_2_1_38_1","volume-title":"Deep Supervised Hashing for Fast Image Retrieval","author":"Liu Haomiao","year":"2064","unstructured":"Haomiao Liu, Ruiping Wang, Shiguang Shan, and Xilin Chen. 2016. Deep Supervised Hashing for Fast Image Retrieval. In CVPR. IEEE Computer Society, 2064\u20132072."},{"key":"e_1_2_1_39_1","volume-title":"Why Are Learned Indexes So Effective but Sometimes Ineffective? arXiv preprint arXiv:2410.00846","author":"Liu Qiyu","year":"2024","unstructured":"Qiyu Liu, Siyuan Han, Yanlin Qi, Jingshu Peng, Jin Li, Longlong Lin, and Lei Chen. 2024. Why Are Learned Indexes So Effective but Sometimes Ineffective? arXiv preprint arXiv:2410.00846 (2024)."},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-024-00893-6"},{"key":"e_1_2_1_41_1","volume-title":"BitTuner: A Toolbox for Automatically Configuring Learned Data Compressors. In 2025 IEEE 41st International Conference on Data Engineering (ICDE). IEEE Computer Society, 4548\u20134551","author":"Liu Qiyu","year":"2025","unstructured":"Qiyu Liu, Yuxin Luo, Mengke Cui, Siyuan Han, Jingshu Peng, Jin Li, and Lei Chen. 2025. BitTuner: A Toolbox for Automatically Configuring Learned Data Compressors. In 2025 IEEE 41st International Conference on Data Engineering (ICDE). IEEE Computer Society, 4548\u20134551."},{"volume-title":"LHist: Towards Learning Multi-dimensional Histogram for Massive Spatial Data","author":"Liu Qiyu","key":"e_1_2_1_42_1","unstructured":"Qiyu Liu, Yanyan Shen, and Lei Chen. 2021. LHist: Towards Learning Multi-dimensional Histogram for Massive Spatial Data. In ICDE. IEEE, 1188\u20131199."},{"key":"e_1_2_1_43_1","volume-title":"HAP: An Efficient Hamming Space Index Based on Augmented Pigeonhole Principle. In SIGMOD Conference. ACM, 917\u2013930","author":"Liu Qiyu","year":"2022","unstructured":"Qiyu Liu, Yanyan Shen, and Lei Chen. 2022. HAP: An Efficient Hamming Space Index Based on Augmented Pigeonhole Principle. In SIGMOD Conference. ACM, 917\u2013930."},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.14778\/3407790.3407830"},{"key":"e_1_2_1_45_1","unstructured":"Ting Liu Andrew W. Moore Alexander G. Gray and Ke Yang. 2004. An Investigation of Practical Approximate Nearest Neighbor Algorithms. In NIPS. 825\u2013832."},{"key":"e_1_2_1_46_1","unstructured":"llamacpp [n.d.]. LLM Inference in C\/C++. https:\/\/github.com\/ggerganov\/llama.cpp. Accessed: 2024-06-12."},{"key":"e_1_2_1_47_1","unstructured":"LZ4 [n.d.]. Etremely fast compression. https:\/\/lz4.org\/. Accessed: 2024-06-12."},{"key":"e_1_2_1_48_1","volume-title":"Proceedings of the fifth Berkeley symposium on mathematical statistics and probability","volume":"1","author":"James","unstructured":"James MacQueen et al. 1967. Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, Vol. 1. Oakland, CA, USA, 281\u2013297."},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.is.2013.10.006"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2018.2889473"},{"key":"e_1_2_1_51_1","volume-title":"Stacked quantizers for compositional vector compression. arXiv preprint arXiv:1411.2173","author":"Martinez Julieta","year":"2014","unstructured":"Julieta Martinez, Holger H Hoos, and James J Little. 2014. Stacked quantizers for compositional vector compression. arXiv preprint arXiv:1411.2173 (2014)."},{"key":"e_1_2_1_52_1","unstructured":"milvus [n.d.]. The High-Performance Vector Database Built for Scale. https:\/\/milvus.io\/. Accessed: 2024-06-12."},{"key":"e_1_2_1_53_1","doi-asserted-by":"crossref","unstructured":"Michael Mitzenmacher. 2018. A Model for Learned Bloom Filters and Optimizing by Sandwiching. In NeurIPS. 462\u2013471.","DOI":"10.1007\/978-1-4614-8265-9_751"},{"key":"e_1_2_1_54_1","unstructured":"mlc [n.d.]. MLC LLM. https:\/\/llm.mlc.ai\/. Accessed: 2024-06-12."},{"key":"e_1_2_1_55_1","unstructured":"mlx [n.d.]. MLX: An array framework for Apple silicon. https:\/\/github.com\/ml-explore\/mlx. Accessed: 2024-06-12."},{"key":"e_1_2_1_56_1","unstructured":"OpenAI. 2023. GPT-4 Technical Report. CoRR abs\/2303.08774 (2023)."},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1145\/358746.358758"},{"key":"e_1_2_1_58_1","unstructured":"Long Ouyang Jeffrey Wu Xu Jiang et al. 2022. Training language models to follow instructions with human feedback. In NeurIPS."},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2024.3352100"},{"key":"e_1_2_1_60_1","unstructured":"Igor Pavlov. [n.d.]. LZMA SDK. https:\/\/7-zip.org\/sdk.html. Accessed: 2024-06-12."},{"key":"e_1_2_1_61_1","unstructured":"pinecone [n.d.]. Build knowledgeable AI. https:\/\/www.pinecone.io\/. Accessed: 2024-06-12."},{"key":"e_1_2_1_62_1","volume-title":"Bkd-tree: A dynamic scalable kd-tree","author":"Procopiuc Octavian","year":"2003","unstructured":"Octavian Procopiuc, Pankaj K Agarwal, Lars Arge, and Jeffrey Scott Vitter. 2003. Bkd-tree: A dynamic scalable kd-tree. In SSTD. Springer, 46\u201365."},{"key":"e_1_2_1_63_1","unstructured":"pytorch [n.d.]. PyTorch. https:\/\/pytorch.org\/. Accessed: 2024-06-12."},{"key":"e_1_2_1_64_1","doi-asserted-by":"crossref","unstructured":"Parikshit Ram and Kaushik Sinha. 2019. Revisiting kd-tree for Nearest Neighbor Search. In KDD. ACM 1378\u20131388.","DOI":"10.1145\/3292500.3330875"},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.14778\/3570690.3570702"},{"volume-title":"Squeezing succinct data structures into entropy bounds","author":"Sadakane Kunihiko","key":"e_1_2_1_66_1","unstructured":"Kunihiko Sadakane and Roberto Grossi. 2006. Squeezing succinct data structures into entropy bounds. In SODA. ACM Press, 1230\u20131239."},{"key":"e_1_2_1_67_1","unstructured":"samsung [n.d.]. How much phone memory and storage do I need? https:\/\/www.samsung.com\/us\/explore\/mobile\/how-much-phone-memory-and-storage-do-I-need\/. Accessed: 2024-08-07."},{"key":"e_1_2_1_68_1","unstructured":"segpq [n.d.]. SegPQ (technical report). https:\/\/github.com\/qyliu-hkust\/segpq. Accessed: 2024-11-12."},{"volume-title":"Energy and Policy Considerations for Modern Deep Learning Research","author":"Strubell Emma","key":"e_1_2_1_69_1","unstructured":"Emma Strubell, Ananya Ganesh, and Andrew McCallum. 2020. Energy and Policy Considerations for Modern Deep Learning Research. In AAAI. AAAI Press, 13693\u201313696."},{"key":"e_1_2_1_70_1","unstructured":"tensorflow [n.d.]. TensorFlow. https:\/\/www.tensorflow.org\/. Accessed: 2024-06-12."},{"key":"e_1_2_1_71_1","volume-title":"LLaMA: Open and Efficient Foundation Language Models. CoRR abs\/2302.13971","author":"Touvron Hugo","year":"2023","unstructured":"Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timoth\u00e9e Lacroix, Baptiste Rozi\u00e8re, Naman Goyal, Eric Hambro, Faisal Azhar, Aur\u00e9lien Rodriguez, Armand Joulin, Edouard Grave, and Guillaume Lample. 2023. LLaMA: Open and Efficient Foundation Language Models. CoRR abs\/2302.13971 (2023)."},{"key":"e_1_2_1_72_1","unstructured":"Hugo Touvron Louis Martin Kevin Stone et al. 2023. Llama 2: Open Foundation and Fine-Tuned Chat Models. CoRR abs\/2307.09288 (2023)."},{"key":"e_1_2_1_73_1","unstructured":"Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan N. Gomez Lukasz Kaiser and Illia Polosukhin. 2017. Attention is All you Need. In NIPS. 5998\u20136008."},{"key":"e_1_2_1_74_1","doi-asserted-by":"crossref","unstructured":"Sebastiano Vigna. 2013. Quasi-succinct indices. In WSDM. ACM 83\u201392.","DOI":"10.1145\/2433396.2433409"},{"key":"e_1_2_1_75_1","doi-asserted-by":"publisher","DOI":"10.14778\/3424573.3424580"},{"key":"e_1_2_1_76_1","unstructured":"wikidata [n.d.]. Wikidata. https:\/\/www.wikidata.org\/wiki\/Wikidata:Main_Page. Accessed: 2024-11-12."},{"key":"e_1_2_1_77_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2018.2817526"},{"key":"e_1_2_1_78_1","doi-asserted-by":"crossref","unstructured":"Hao Yan Shuai Ding and Torsten Suel. 2009. Inverted index compression and query processing with optimized document ordering. In WWW. ACM 401\u2013410.","DOI":"10.1145\/1526709.1526764"},{"volume-title":"Data Structures and Algorithms for Nearest Neighbor Search in General Metric Spaces","author":"Yianilos Peter N.","key":"e_1_2_1_79_1","unstructured":"Peter N. Yianilos. 1993. Data Structures and Algorithms for Nearest Neighbor Search in General Metric Spaces. In SODA. ACM\/SIAM, 311\u2013321."},{"key":"e_1_2_1_80_1","unstructured":"youtube [n.d.]. YouTube-8M. https:\/\/research.google.com\/youtube8m\/download.html. Accessed: 2024-06-12."}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3749646.3749650","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,5]],"date-time":"2025-09-05T03:18:39Z","timestamp":1757042319000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3749646.3749650"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7]]},"references-count":80,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2025,7]]}},"alternative-id":["10.14778\/3749646.3749650"],"URL":"https:\/\/doi.org\/10.14778\/3749646.3749650","relation":{},"ISSN":["2150-8097"],"issn-type":[{"type":"print","value":"2150-8097"}],"subject":[],"published":{"date-parts":[[2025,7]]},"assertion":[{"value":"2025-09-04","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}