{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,16]],"date-time":"2026-01-16T00:39:29Z","timestamp":1768523969077,"version":"3.49.0"},"reference-count":93,"publisher":"Association for Computing Machinery (ACM)","issue":"1","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2024,9]]},"abstract":"<jats:p>\n            A Retrieval-Augmented Language Model (RALM) combines a large language model (LLM) with a vector database to retrieve context-specific knowledge during text generation. This strategy facilitates impressive generation quality even with smaller models, thus reducing computational demands by orders of magnitude. To serve RALMs efficiently and flexibly, we propose\n            <jats:italic>Chameleon<\/jats:italic>\n            , a heterogeneous accelerator system integrating both LLM and vector search accelerators in a disaggregated architecture. The\n            <jats:italic>heterogeneity<\/jats:italic>\n            ensures efficient serving for both inference and retrieval, while the\n            <jats:italic>disaggregation<\/jats:italic>\n            allows independent scaling of LLM and vector search accelerators to fulfill diverse RALM requirements. Our Chameleon prototype implements vector search accelerators on FPGAs and assigns LLM inference to GPUs, with CPUs as cluster coordinators. Evaluated on various RALMs, Chameleon exhibits up to 2.16\u00d7 reduction in latency and 3.18\u00d7 speedup in throughput compared to the hybrid CPU-GPU architecture. The promising results pave the way for adopting heterogeneous accelerators for not only LLM inference but also vector search in future RALM systems.\n          <\/jats:p>","DOI":"10.14778\/3696435.3696439","type":"journal-article","created":{"date-parts":[[2025,2,11]],"date-time":"2025-02-11T20:41:46Z","timestamp":1739306506000},"page":"42-52","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":10,"title":["Chameleon: A Heterogeneous and Disaggregated Accelerator System for Retrieval-Augmented Language Models"],"prefix":"10.14778","volume":"18","author":[{"given":"Wenqi","family":"Jiang","sequence":"first","affiliation":[{"name":"Systems Group, ETH Zurich"}]},{"given":"Marco","family":"Zeller","sequence":"additional","affiliation":[{"name":"Systems Group, ETH Zurich"}]},{"given":"Roger","family":"Waleffe","sequence":"additional","affiliation":[{"name":"University of Wisconsin Madison"}]},{"given":"Torsten","family":"Hoefler","sequence":"additional","affiliation":[{"name":"SPCL, ETH Zurich"}]},{"given":"Gustavo","family":"Alonso","sequence":"additional","affiliation":[{"name":"Systems Group, ETH Zurich"}]}],"member":"320","published-online":{"date-parts":[[2025,2,11]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"[n.d.]. Faiss. https:\/\/github.com\/facebookresearch\/faiss\/."},{"key":"e_1_2_1_2_1","unstructured":"[n.d.]. FasterTransformer. https:\/\/github.com\/NVIDIA\/FasterTransformer."},{"key":"e_1_2_1_3_1","unstructured":"[n.d.]. The Implications of OpenAI's Latest Update on RAG and Vector-Only Databases. https:\/\/medium.com\/@vishalkalia.er\/the-implications-of-openais-latest-update-on-rag-and-vector-only-databases-c3f326cce0a1."},{"key":"e_1_2_1_4_1","unstructured":"[n.d.]. What does OpenAI's announcement mean for Retrieval Augmented Generation (RAG) and Vector-only Databases? https:\/\/medium.com\/madhukarkumar\/what-does-openais-announcement-mean-for-retrieval-augmented-generation-rag-and-vector-only-54bfc34cba2c."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/215399.215427"},{"key":"e_1_2_1_6_1","volume-title":"International Conference on Machine Learning. PMLR, 468--485","author":"Alon Uri","year":"2022","unstructured":"Uri Alon, Frank Xu, Junxian He, Sudipta Sengupta, Dan Roth, and Graham Neubig. 2022. Neuro-symbolic language modeling with automaton-augmented retrieval. In International Conference on Machine Learning. PMLR, 468--485."},{"key":"e_1_2_1_7_1","volume-title":"International conference on machine learning. PMLR, 2206--2240","author":"Borgeaud Sebastian","year":"2022","unstructured":"Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford, Katie Millican, George Bm Van Den Driessche, Jean-Baptiste Lespiau, Bogdan Damoc, Aidan Clark, et al. 2022. Improving language models by retrieving from trillions of tokens. In International conference on machine learning. PMLR, 2206--2240."},{"key":"e_1_2_1_8_1","unstructured":"Tom Brown Benjamin Mann Nick Ryder Melanie Subbiah Jared D Kaplan Prafulla Dhariwal Arvind Neelakantan Pranav Shyam Girish Sastry Amanda Askell et al. 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020) 1877--1901."},{"key":"e_1_2_1_9_1","volume-title":"SPANN: Highly-efficient Billion-scale Approximate Nearest Neighbor Search. arXiv preprint arXiv:2111.08566","author":"Chen Qi","year":"2021","unstructured":"Qi Chen, Bing Zhao, Haidong Wang, Mingqin Li, Chuanjie Liu, Zengzhong Li, Mao Yang, and Jingdong Wang. 2021. SPANN: Highly-efficient Billion-scale Approximate Nearest Neighbor Search. arXiv preprint arXiv:2111.08566 (2021)."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.future.2019.04.033"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/3323873.3325018"},{"key":"e_1_2_1_12_1","volume-title":"Charles Sutton, Sebastian Gehrmann, et al.","author":"Chowdhery Aakanksha","year":"2022","unstructured":"Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, et al. 2022. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311 (2022)."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/155332.155333"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.14778\/2735461.2735463"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/997817.997857"},{"key":"e_1_2_1_16_1","volume-title":"Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805","author":"Devlin Jacob","year":"2018","unstructured":"Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)."},{"key":"e_1_2_1_17_1","volume-title":"Fast approximate nearest neighbor search with the navigating spreading-out graph. arXiv preprint arXiv:1707.00143","author":"Fu Cong","year":"2017","unstructured":"Cong Fu, Chao Xiang, Changxu Wang, and Deng Cai. 2017. Fast approximate nearest neighbor search with the navigating spreading-out graph. arXiv preprint arXiv:1707.00143 (2017)."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/3589282"},{"key":"e_1_2_1_19_1","first-page":"1","article-title":"RaBitQ: Quantizing High-Dimensional Vectors with a Theoretical Error Bound for Approximate Nearest Neighbor Search","volume":"2","author":"Gao Jianyang","year":"2024","unstructured":"Jianyang Gao and Cheng Long. 2024. RaBitQ: Quantizing High-Dimensional Vectors with a Theoretical Error Bound for Approximate Nearest Neighbor Search. Proceedings of the ACM on Management of Data 2, 3 (2024), 1--27.","journal-title":"Proceedings of the ACM on Management of Data"},{"key":"e_1_2_1_20_1","volume-title":"Optimized product quantization","author":"Ge Tiezheng","year":"2013","unstructured":"Tiezheng Ge, Kaiming He, Qifa Ke, and Jian Sun. 2013. Optimized product quantization. IEEE transactions on pattern analysis and machine intelligence 36, 4 (2013), 744--755."},{"key":"e_1_2_1_21_1","first-page":"518","article-title":"Similarity search in high dimensions via hashing","volume":"99","author":"Gionis Aristides","year":"1999","unstructured":"Aristides Gionis, Piotr Indyk, Rajeev Motwani, et al. 1999. Similarity search in high dimensions via hashing. In Vldb, Vol. 99. 518--529.","journal-title":"Vldb"},{"key":"e_1_2_1_22_1","volume-title":"Manu: A Cloud Native Vector Database Management System. arXiv preprint arXiv:2206.13843","author":"Guo Rentong","year":"2022","unstructured":"Rentong Guo, Xiaofan Luan, Long Xiang, Xiao Yan, Xiaomeng Yi, Jigao Luo, Qianya Cheng, Weizhi Xu, Jiarui Luo, Frank Liu, et al. 2022. Manu: A Cloud Native Vector Database Management System. arXiv preprint arXiv:2206.13843 (2022)."},{"key":"e_1_2_1_23_1","volume-title":"International conference on machine learning. PMLR, 3929--3938","author":"Guu Kelvin","year":"2020","unstructured":"Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Mingwei Chang. 2020. Retrieval augmented language model pre-training. In International conference on machine learning. PMLR, 3929--3938."},{"key":"e_1_2_1_24_1","volume-title":"Realm: Retrieval-augmented language model pre-training. arXiv preprint arXiv:2002.08909","author":"Guu Kelvin","year":"2020","unstructured":"Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Ming-Wei Chang. 2020. Realm: Retrieval-augmented language model pre-training. arXiv preprint arXiv:2002.08909 (2020)."},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/FPL53798.2021.00040"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2007.370593"},{"key":"e_1_2_1_27_1","volume-title":"Energy, memory, and runtime tradeoffs for implementing collective communication operations. Supercomputing frontiers and innovations 1, 2","author":"Hoefler Torsten","year":"2014","unstructured":"Torsten Hoefler and Dmitry Moor. 2014. Energy, memory, and runtime tradeoffs for implementing collective communication operations. Supercomputing frontiers and innovations 1, 2 (2014), 58--75."},{"key":"e_1_2_1_28_1","volume-title":"2022 55th IEEE\/ACM International Symposium on Microarchitecture (MICRO). IEEE, 763--783","author":"Hu Han-Wen","year":"2022","unstructured":"Han-Wen Hu, Wei-Chen Wang, Yuan-Hao Chang, Yung-Chun Lee, Bo-Rong Lin, Huai-Mu Wang, Yen-Po Lin, Yu-Ming Huang, Chong-Ying Lee, Tzu-Hsiang Su, et al. 2022. ICE: An Intelligent Cognition Engine with 3D NAND-based In-Memory Computing for Vector Similarity Search Acceleration. In 2022 55th IEEE\/ACM International Symposium on Microarchitecture (MICRO). IEEE, 763--783."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/FPL.2014.6927413"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3457240"},{"key":"e_1_2_1_31_1","volume-title":"Leveraging passage retrieval with generative models for open domain question answering. arXiv preprint arXiv:2007.01282","author":"Izacard Gautier","year":"2020","unstructured":"Gautier Izacard and Edouard Grave. 2020. Leveraging passage retrieval with generative models for open domain question answering. arXiv preprint arXiv:2007.01282 (2020)."},{"key":"e_1_2_1_32_1","volume-title":"Few-shot learning with retrieval augmented language models. arXiv preprint arXiv:2208.03299","author":"Izacard Gautier","year":"2022","unstructured":"Gautier Izacard, Patrick Lewis, Maria Lomeli, Lucas Hosseini, Fabio Petroni, Timo Schick, Jane Dwivedi-Yu, Armand Joulin, Sebastian Riedel, and Edouard Grave. 2022. Few-shot learning with retrieval augmented language models. arXiv preprint arXiv:2208.03299 (2022)."},{"key":"e_1_2_1_33_1","volume-title":"2023 USENIX Annual Technical Conference (USENIX ATC 23)","author":"Jang Junhyeok","year":"2023","unstructured":"Junhyeok Jang, Hanjin Choi, Hanyeoreum Bae, Seungjun Lee, Miryeong Kwon, and Myoungsoo Jung. 2023. {CXL-ANNS}:{Software-Hardware} Collaborative Memory Disaggregation and Computation for {Billion-Scale} Approximate Nearest Neighbor Search. In 2023 USENIX Annual Technical Conference (USENIX ATC 23). 585--600."},{"key":"e_1_2_1_34_1","volume-title":"Ravishankar Krishnawamy, and Rohan Kadekodi.","author":"Subramanya Suhas Jayaram","year":"2019","unstructured":"Suhas Jayaram Subramanya, Fnu Devvrit, Harsha Vardhan Simhadri, Ravishankar Krishnawamy, and Rohan Kadekodi. 2019. Diskann: Fast accurate billion-point nearest neighbor search on a single node. Advances in Neural Information Processing Systems 32 (2019)."},{"key":"e_1_2_1_35_1","volume-title":"Product quantization for nearest neighbor search","author":"Jegou Herve","year":"2010","unstructured":"Herve Jegou, Matthijs Douze, and Cordelia Schmid. 2010. Product quantization for nearest neighbor search. IEEE transactions on pattern analysis and machine intelligence 33, 1 (2010), 117--128."},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2749447"},{"key":"e_1_2_1_37_1","volume-title":"Accelerating Graph-based Vector Search via Delayed-Synchronization Traversal. arXiv preprint arXiv:2406.12385","author":"Jiang Wenqi","year":"2024","unstructured":"Wenqi Jiang, Hang Hu, Torsten Hoefler, and Gustavo Alonso. 2024. Accelerating Graph-based Vector Search via Delayed-Synchronization Traversal. arXiv preprint arXiv:2406.12385 (2024)."},{"key":"e_1_2_1_38_1","volume-title":"Zhenhao He, Runbin Shi, Cedric Renggli, Shuai Zhang, Theodoros Rekatsinas, Torsten Hoefler, et al.","author":"Jiang Wenqi","year":"2023","unstructured":"Wenqi Jiang, Shigang Li, Yu Zhu, Johannes de Fine Licht, Zhenhao He, Runbin Shi, Cedric Renggli, Shuai Zhang, Theodoros Rekatsinas, Torsten Hoefler, et al. 2023. Co-design Hardware and Algorithm for Vector Search. arXiv preprint arXiv:2306.11182 (2023)."},{"key":"e_1_2_1_39_1","volume-title":"Piperag: Fast retrieval-augmented generation via algorithm-system co-design. arXiv preprint arXiv:2403.05676","author":"Jiang Wenqi","year":"2024","unstructured":"Wenqi Jiang, Shuai Zhang, Boran Han, Jie Wang, Bernie Wang, and Tim Kraska. 2024. Piperag: Fast retrieval-augmented generation via algorithm-system co-design. arXiv preprint arXiv:2403.05676 (2024)."},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/TBDATA.2019.2921572"},{"key":"e_1_2_1_41_1","volume-title":"Nearest neighbor machine translation. arXiv preprint arXiv:2010.00710","author":"Khandelwal Urvashi","year":"2020","unstructured":"Urvashi Khandelwal, Angela Fan, Dan Jurafsky, Luke Zettlemoyer, and Mike Lewis. 2020. Nearest neighbor machine translation. arXiv preprint arXiv:2010.00710 (2020)."},{"key":"e_1_2_1_42_1","volume-title":"Generalization through memorization: Nearest neighbor language models. arXiv preprint arXiv:1911.00172","author":"Khandelwal Urvashi","year":"2019","unstructured":"Urvashi Khandelwal, Omer Levy, Dan Jurafsky, Luke Zettlemoyer, and Mike Lewis. 2019. Generalization through memorization: Nearest neighbor language models. arXiv preprint arXiv:1911.00172 (2019)."},{"key":"e_1_2_1_43_1","volume-title":"Internet-augmented dialogue generation. arXiv preprint arXiv:2107.07566","author":"Komeili Mojtaba","year":"2021","unstructured":"Mojtaba Komeili, Kurt Shuster, and Jason Weston. 2021. Internet-augmented dialogue generation. arXiv preprint arXiv:2107.07566 (2021)."},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/3600006.3613165"},{"key":"e_1_2_1_45_1","volume-title":"ANNA: Specialized Architecture for Approximate Nearest Neighbor Search. In 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE, 169--183","author":"Lee Yejin","year":"2022","unstructured":"Yejin Lee, Hyunji Choi, Sunhong Min, Hyunseung Lee, Sangwon Beak, Dawoon Jeong, Jae W Lee, and Tae Jun Ham. 2022. ANNA: Specialized Architecture for Approximate Nearest Neighbor Search. In 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE, 169--183."},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2008.130"},{"key":"e_1_2_1_48_1","first-page":"18470","article-title":"Pre-training via paraphrasing","volume":"33","author":"Lewis Mike","year":"2020","unstructured":"Mike Lewis, Marjan Ghazvininejad, Gargi Ghosh, Armen Aghajanyan, Sida Wang, and Luke Zettlemoyer. 2020. Pre-training via paraphrasing. Advances in Neural Information Processing Systems 33 (2020), 18470--18481.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_1_49_1","first-page":"9459","article-title":"Retrieval-augmented generation for knowledge-intensive nlp tasks","volume":"33","author":"Lewis Patrick","year":"2020","unstructured":"Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich K\u00fcttler, Mike Lewis, Wen-tau Yih, Tim Rockt\u00e4schel, et al. 2020. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems 33 (2020), 9459--9474.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_1_50_1","volume-title":"The dark side of chatgpt: Legal and ethical challenges from stochastic parrots and hallucination. arXiv preprint arXiv:2304.14347","author":"Zihao Li.","year":"2023","unstructured":"Zihao Li. 2023. The dark side of chatgpt: Legal and ethical challenges from stochastic parrots and hallucination. arXiv preprint arXiv:2304.14347 (2023)."},{"key":"e_1_2_1_51_1","first-page":"21698","article-title":"Decoupled context processing for context augmented language modeling","volume":"35","author":"Li Zonglin","year":"2022","unstructured":"Zonglin Li, Ruiqi Guo, and Sanjiv Kumar. 2022. Decoupled context processing for context augmented language modeling. Advances in Neural Information Processing Systems 35 (2022), 21698--21710.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_1_52_1","volume-title":"JUNO: Optimizing High-Dimensional Approximate Nearest Neighbour Search with Sparsity-Aware Algorithm and Ray-Tracing Core Mapping. arXiv preprint arXiv:2312.01712","author":"Liu Zihan","year":"2023","unstructured":"Zihan Liu, Wentao Ni, Jingwen Leng, Yu Feng, Cong Guo, Quan Chen, Chao Li, Minyi Guo, and Yuhao Zhu. 2023. JUNO: Optimizing High-Dimensional Approximate Nearest Neighbour Search with Sparsity-Aware Algorithm and Ray-Tracing Core Mapping. arXiv preprint arXiv:2312.01712 (2023)."},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.14778\/3489496.3489506"},{"key":"e_1_2_1_54_1","volume-title":"Efficient processing of k nearest neighbor joins using mapreduce. arXiv preprint arXiv:1207.0141","author":"Lu Wei","year":"2012","unstructured":"Wei Lu, Yanyan Shen, Su Chen, and Beng Chin Ooi. 2012. Efficient processing of k nearest neighbor joins using mapreduce. arXiv preprint arXiv:1207.0141 (2012)."},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.is.2013.10.006"},{"key":"e_1_2_1_56_1","volume-title":"Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs","author":"Malkov Yu A","year":"2018","unstructured":"Yu A Malkov and Dmitry A Yashunin. 2018. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE transactions on pattern analysis and machine intelligence 42, 4 (2018), 824--836."},{"key":"e_1_2_1_57_1","volume-title":"Fast nearest neighbor machine translation. arXiv preprint arXiv:2105.14528","author":"Meng Yuxian","year":"2021","unstructured":"Yuxian Meng, Xiaoya Li, Xiayu Zheng, Fei Wu, Xiaofei Sun, Tianwei Zhang, and Jiwei Li. 2021. Fast nearest neighbor machine translation. arXiv preprint arXiv:2105.14528 (2021)."},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1145\/3589777"},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1145\/3341301.3359646"},{"key":"e_1_2_1_60_1","volume-title":"fairseq: A fast, extensible toolkit for sequence modeling. arXiv preprint arXiv:1904.01038","author":"Ott Myle","year":"2019","unstructured":"Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, and Michael Auli. 2019. fairseq: A fast, extensible toolkit for sequence modeling. arXiv preprint arXiv:1904.01038 (2019)."},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3389746"},{"key":"e_1_2_1_62_1","volume-title":"Survey of vector database management systems. arXiv preprint arXiv:2310.14021","author":"Pan James Jie","year":"2023","unstructured":"James Jie Pan, Jianguo Wang, and Guoliang Li. 2023. Survey of vector database management systems. arXiv preprint arXiv:2310.14021 (2023)."},{"key":"e_1_2_1_63_1","volume-title":"Splitwise: Efficient generative llm inference using phase splitting. arXiv preprint arXiv:2311.18677","author":"Patel Pratyush","year":"2023","unstructured":"Pratyush Patel, Esha Choukse, Chaojie Zhang, \u00cd\u00f1igo Goiri, Aashaka Shah, Saeed Maleki, and Ricardo Bianchini. 2023. Splitwise: Efficient generative llm inference using phase splitting. arXiv preprint arXiv:2311.18677 (2023)."},{"key":"e_1_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCAD51958.2021.9643528"},{"key":"e_1_2_1_65_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3588908","article-title":"Efficient approximate nearest neighbor search in multi-dimensional databases","volume":"1","author":"Peng Yun","year":"2023","unstructured":"Yun Peng, Byron Choi, Tsz Nam Chan, Jianye Yang, and Jianliang Xu. 2023. Efficient approximate nearest neighbor search in multi-dimensional databases. Proceedings of the ACM on Management of Data 1, 1 (2023), 1--27.","journal-title":"Proceedings of the ACM on Management of Data"},{"key":"e_1_2_1_66_1","unstructured":"Alec Radford Jeffrey Wu Rewon Child David Luan Dario Amodei Ilya Sutskever et al. 2019. Language models are unsupervised multitask learners. OpenAI blog 1 8 (2019) 9."},{"key":"e_1_2_1_67_1","unstructured":"Jack W Rae Sebastian Borgeaud Trevor Cai Katie Millican Jordan Hoffmann Francis Song John Aslanides Sarah Henderson Roman Ring Susannah Young et al. 2021. Scaling language models: Methods analysis & insights from training gopher. arXiv preprint arXiv:2112.11446 (2021)."},{"key":"e_1_2_1_68_1","doi-asserted-by":"publisher","DOI":"10.5555\/3455716.3455856"},{"key":"e_1_2_1_69_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC41405.2020.00024"},{"key":"e_1_2_1_70_1","volume-title":"In-context retrieval-augmented language models. arXiv preprint arXiv:2302.00083","author":"Ram Ori","year":"2023","unstructured":"Ori Ram, Yoav Levine, Itay Dalmedigos, Dor Muhlgay, Amnon Shashua, Kevin Leyton-Brown, and Yoav Shoham. 2023. In-context retrieval-augmented language models. arXiv preprint arXiv:2302.00083 (2023)."},{"key":"e_1_2_1_71_1","first-page":"10672","article-title":"Hm-ann: Efficient billion-point nearest neighbor search on heterogeneous memory","volume":"33","author":"Ren Jie","year":"2020","unstructured":"Jie Ren, Minjia Zhang, and Dong Li. 2020. Hm-ann: Efficient billion-point nearest neighbor search on heterogeneous memory. Advances in Neural Information Processing Systems 33 (2020), 10672--10684.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_1_72_1","volume-title":"End-to-end training of neural retrievers for open-domain question answering. arXiv preprint arXiv:2101.00408","author":"Sachan Devendra Singh","year":"2021","unstructured":"Devendra Singh Sachan, Mostofa Patwary, Mohammad Shoeybi, Neel Kant, Wei Ping, William L Hamilton, and Bryan Catanzaro. 2021. End-to-end training of neural retrievers for open-domain question answering. arXiv preprint arXiv:2101.00408 (2021)."},{"key":"e_1_2_1_73_1","volume-title":"Megatron-lm: Training multi-billion parameter language models using model parallelism. arXiv preprint arXiv:1909.08053","author":"Shoeybi Mohammad","year":"2019","unstructured":"Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper, and Bryan Catanzaro. 2019. Megatron-lm: Training multi-billion parameter language models using model parallelism. arXiv preprint arXiv:1909.08053 (2019)."},{"key":"e_1_2_1_74_1","unstructured":"Shaden Smith Mostofa Patwary Brandon Norick Patrick LeGresley Samyam Rajbhandari Jared Casper Zhun Liu Shrimai Prabhumoye George Zerveas Vijay Korthikanti et al. 2022. Using deepspeed and megatron to train megatron-turing nlg 530b a large-scale generative language model. arXiv preprint arXiv:2201.11990 (2022)."},{"key":"e_1_2_1_75_1","doi-asserted-by":"publisher","DOI":"10.14778\/2735461.2735462"},{"key":"e_1_2_1_76_1","volume-title":"Instructretro: Instruction tuning post retrieval-augmented pretraining. arXiv preprint arXiv:2310.07713","author":"Wang Boxin","year":"2023","unstructured":"Boxin Wang, Wei Ping, Lawrence McAfee, Peng Xu, Bo Li, Mohammad Shoeybi, and Bryan Catanzaro. 2023. Instructretro: Instruction tuning post retrieval-augmented pretraining. arXiv preprint arXiv:2310.07713 (2023)."},{"key":"e_1_2_1_77_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3457550"},{"key":"e_1_2_1_78_1","volume-title":"Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. 923--938","author":"Wang Xiaoyang","year":"2015","unstructured":"Xiaoyang Wang, Ying Zhang, Wenjie Zhang, Xuemin Lin, and Muhammad Aamir Cheema. 2015. Optimal spatial dominance: an effective search of nearest neighbor candidates. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. 923--938."},{"key":"e_1_2_1_79_1","doi-asserted-by":"publisher","DOI":"10.14778\/3415478.3415541"},{"key":"e_1_2_1_80_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.223"},{"key":"e_1_2_1_81_1","volume-title":"Proceedings of the 2014 ACM SIGMOD international conference on Management of Data. 1139--1150","author":"Wu Yubao","year":"2014","unstructured":"Yubao Wu, Ruoming Jin, and Xiang Zhang. 2014. Fast and unified local search for random walk based k-nearest-neighbor query in large graphs. In Proceedings of the 2014 ACM SIGMOD international conference on Management of Data. 1139--1150."},{"key":"e_1_2_1_82_1","volume-title":"Why do Nearest Neighbor Language Models Work? arXiv preprint arXiv:2301.02828","author":"Xu Frank F","year":"2023","unstructured":"Frank F Xu, Uri Alon, and Graham Neubig. 2023. Why do Nearest Neighbor Language Models Work? arXiv preprint arXiv:2301.02828 (2023)."},{"key":"e_1_2_1_83_1","doi-asserted-by":"publisher","DOI":"10.1145\/3600006.3613166"},{"key":"e_1_2_1_84_1","doi-asserted-by":"publisher","DOI":"10.14778\/2735479.2735492"},{"key":"e_1_2_1_85_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3386131"},{"key":"e_1_2_1_86_1","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00371"},{"key":"e_1_2_1_87_1","volume-title":"16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22)","author":"Yu Gyeong-In","year":"2022","unstructured":"Gyeong-In Yu, Joo Seong Jeong, Geon-Woo Kim, Soojeong Kim, and Byung-Gon Chun. 2022. Orca: A distributed serving system for {Transformer-Based} generative models. In 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22). 521--538."},{"key":"e_1_2_1_88_1","doi-asserted-by":"crossref","unstructured":"Shulin Zeng Zhenhua Zhu Jun Liu Haoyu Zhang Guohao Dai Zixuan Zhou Shuangchen Li Xuefei Ning Yuan Xie Huazhong Yang et al. 2023. DF-GAS: a Distributed FPGA-as-a-Service Architecture towards Billion-Scale Graph-based Approximate Nearest Neighbor Search. (2023).","DOI":"10.1145\/3613424.3614292"},{"key":"e_1_2_1_89_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00517"},{"key":"e_1_2_1_90_1","volume-title":"17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 23)","author":"Zhang Qianxi","year":"2023","unstructured":"Qianxi Zhang, Shuotao Xu, Qi Chen, Guoxin Sui, Jiadong Xie, Zhizhen Cai, Yaoqi Chen, Yinxuan He, Yuqing Yang, Fan Yang, et al. 2023. {VBASE}: Unifying Online Vector Similarity Search and Relational Queries via Relaxed Monotonicity. In 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 23). 377--395."},{"key":"e_1_2_1_91_1","doi-asserted-by":"publisher","DOI":"10.14778\/3594512.3594527"},{"key":"e_1_2_1_92_1","doi-asserted-by":"publisher","DOI":"10.1145\/2882903.2882930"},{"key":"e_1_2_1_93_1","volume-title":"Proceedings of the 2016 International Conference on Management of Data. 2053--2068","author":"Zhu Huaijie","year":"2016","unstructured":"Huaijie Zhu, Xiaochun Yang, Bin Wang, and Wang-Chien Lee. 2016. Range-based obstructed nearest neighbor queries. In Proceedings of the 2016 International Conference on Management of Data. 2053--2068."},{"key":"e_1_2_1_94_1","doi-asserted-by":"publisher","DOI":"10.14778\/3603581.3603601"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3696435.3696439","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,2,11]],"date-time":"2025-02-11T20:43:10Z","timestamp":1739306590000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3696435.3696439"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,9]]},"references-count":93,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2024,9]]}},"alternative-id":["10.14778\/3696435.3696439"],"URL":"https:\/\/doi.org\/10.14778\/3696435.3696439","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2024,9]]},"assertion":[{"value":"2025-02-11","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}