{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,13]],"date-time":"2026-06-13T04:58:39Z","timestamp":1781326719048,"version":"3.54.1"},"reference-count":58,"publisher":"Association for Computing Machinery (ACM)","issue":"6","funder":[{"name":"Beijing Natural Science Foundation","award":["L242016"],"award-info":[{"award-number":["L242016"]}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62332011"],"award-info":[{"award-number":["62332011"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Manag. Data"],"published-print":{"date-parts":[[2025,12,4]]},"abstract":"<jats:p>\n                    Approximate nearest neighbor search (ANNS) is broadly adopted in numerous scenarios. Real-world applications seek efficient ways to search billion-scale vectors in high throughput. On-SSD graph-based ANNS systems have the opportunity to achieve this goal, but the limited CPU computing power becomes a bottleneck. In this paper, we propose a GPU-centric, CPU-assisted ANNS architecture and design\n                    <jats:sc>GustANN,<\/jats:sc>\n                    a billion-scale graph-based vector search system for high throughput and cost-effectiveness. We achieve these goals with three techniques: (1) memory-efficient GPU kernels optimized to minimize the GPU memory usage in the graph search, which allows higher concurrency for GPU and SSD; (2) CPU-assisted transfer to address the PCIe bandwidth bottleneck on the GPU-side; (3) pivot search for inter-SSD load balancing. Compared to existing ANNS systems,\n                    <jats:sc>GustANN<\/jats:sc>\n                    achieves at least 2.50\u00d7 higher throughput, and is 2.62\u00d7 more cost-effective (measured in \/QPS).\n                  <\/jats:p>","DOI":"10.1145\/3769799","type":"journal-article","created":{"date-parts":[[2025,12,6]],"date-time":"2025-12-06T04:32:13Z","timestamp":1764995533000},"page":"1-27","source":"Crossref","is-referenced-by-count":0,"title":["High-Throughput, Cost-Effective Billion-Scale Vector Search with a Single GPU"],"prefix":"10.1145","volume":"3","author":[{"ORCID":"https:\/\/orcid.org\/0009-0007-2347-1680","authenticated-orcid":false,"given":"Haodi","family":"Jiang","sequence":"first","affiliation":[{"name":"Tsinghua University, Beijing, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-4819-5786","authenticated-orcid":false,"given":"Hao","family":"Guo","sequence":"additional","affiliation":[{"name":"Tsinghua University, Beijing, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6684-8336","authenticated-orcid":false,"given":"Minhui","family":"Xie","sequence":"additional","affiliation":[{"name":"Renmin University of China, Beijing, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7362-2789","authenticated-orcid":false,"given":"Jiwu","family":"Shu","sequence":"additional","affiliation":[{"name":"Tsinghua University, Beijing, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6214-5390","authenticated-orcid":false,"given":"Youyou","family":"Lu","sequence":"additional","affiliation":[{"name":"Tsinghua University, Beijing, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2025,12,5]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"4th Gen Intel\u00ae Xeon\u00ae Scalable Processors. https:\/\/ark.intel.com\/content\/www\/us\/en\/ark\/products\/series\/228622\/4th-gen-intel-xeon-scalable-processors.html","unstructured":"2024. 4th Gen Intel\u00ae Xeon\u00ae Scalable Processors. https:\/\/ark.intel.com\/content\/www\/us\/en\/ark\/products\/series\/228622\/4th-gen-intel-xeon-scalable-processors.html."},{"key":"e_1_2_1_2_1","unstructured":"2024. APP Metrics for Intel\u00ae Microprocessors. https:\/\/www.intel.com\/content\/dam\/support\/us\/en\/documents\/processors\/APP-for-Intel-Xeon-Processors.pdf."},{"key":"e_1_2_1_3_1","unstructured":"2024. Google search statistics 2024 (no. of searches per day). https:\/\/www.demandsage.com\/google-search-statistics\/."},{"key":"e_1_2_1_4_1","unstructured":"2024. Introducing OpenAI o1. https:\/\/openai.com\/o1\/."},{"key":"e_1_2_1_5_1","unstructured":"2024. New embedding models and API updates. https:\/\/openai.com\/index\/new-embedding-models-and-api-updates\/."},{"key":"e_1_2_1_6_1","unstructured":"2024. NVIDIA A100 | NVIDIA. https:\/\/www.nvidia.com\/en-us\/data-center\/a100\/."},{"key":"e_1_2_1_7_1","unstructured":"2024. NVIDIA GPUDirect Storage. https:\/\/docs.nvidia.com\/gpudirect-storage\/index.html."},{"key":"e_1_2_1_8_1","unstructured":"2024. NVIDIA Tesla A100 Ampere 40 GB Graphics Processor Accelerator - PCIe 4.0 x16 - Dual Slot. https:\/\/www. amazon.com\/NVIDIA-Ampere-Graphics-Processor-Accelerator\/dp\/B08X13X6HF\/."},{"key":"e_1_2_1_9_1","unstructured":"2024. PowerEdge R760 Rack Server. https:\/\/www.dell.com\/en-us\/shop\/dell-poweredge-servers\/new-poweredge-r760-rack-server\/spd\/poweredge-r760\/pe_r760_15724_vi_vp."},{"key":"e_1_2_1_10_1","unstructured":"2024. PowerEdge R760xa Rack Server. https:\/\/www.dell.com\/en-us\/shop\/cty\/pdp\/spd\/poweredge-r760xa\/."},{"key":"e_1_2_1_11_1","unstructured":"2024. Samsung 32GB DDR5 4800MHz PC5-38400 ECC RDIMM 1Rx4 (EC8 10x4) Single Rank 1.1V Registered DIMM 288-Pin Server RAM Memory M321R4GA0BB0-CQK. https:\/\/www.amazon.com\/Samsung-4800MHz-PC5-38400- Registered-M321R4GA0BB0-CQK\/dp\/B0C35566S9."},{"key":"e_1_2_1_12_1","unstructured":"2024. Samsung PM1743 3.84TB NVMe GEN5 E3.S 1T 12000MBps\/12000MBps - MZ3LO3T8HCJR-00A07. https:\/\/www.newegg.com\/p\/2U3-0005-000N6?Item=9SIA12KKA53814."},{"key":"e_1_2_1_13_1","unstructured":"2024. Storage Performance Development Kit. https:\/\/spdk.io."},{"key":"e_1_2_1_14_1","unstructured":"2024. Supermicro 1U SuperStorage Server. https:\/\/store.supermicro.com\/us_en\/1u-superstorage-ssg-121e-ne316r.html."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.14778\/3611479.3611537"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.14778\/3204028.3204034"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01258-8_13"},{"key":"e_1_2_1_18_1","volume-title":"SPANN: Highly-efficient Billion-scale Approximate Nearest Neighbor Search. In 35th Conference on Neural Information Processing Systems (NeurIPS","author":"Chen Qi","year":"2021","unstructured":"Qi Chen, Bing Zhao, Haidong Wang, Mingqin Li, Chuanjie Liu, Zengzhong Li, Mao Yang, and Jingdong Wang. 2021. SPANN: Highly-efficient Billion-scale Approximate Nearest Neighbor Search. In 35th Conference on Neural Information Processing Systems (NeurIPS 2021)."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2405.03267"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/177424.177609"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.14778\/3303753.3303754"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/3654970"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/3709730"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/2304576.2304621"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/TBDATA.2022.3161156"},{"key":"e_1_2_1_27_1","first-page":"171","volume-title":"19th USENIX Symposium on Operating Systems Design and Implementation (OSDI 25)","author":"Guo Hao","year":"2025","unstructured":"Hao Guo and Youyou Lu. 2025. Achieving Low-Latency Graph-Based Vector Search via Aligning Best-First Search Algorithm with SSD. In 19th USENIX Symposium on Operating Systems Design and Implementation (OSDI 25). USENIX Association, Boston, MA, 171-186."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.616"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/3394486.3403305"},{"key":"e_1_2_1_30_1","volume-title":"Ravishankar Krishnawamy, and Rohan Kadekodi.","author":"Subramanya Suhas Jayaram","year":"2019","unstructured":"Suhas Jayaram Subramanya, Fnu Devvrit, Harsha Vardhan Simhadri, Ravishankar Krishnawamy, and Rohan Kadekodi. 2019. DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alch\u00e9-Buc, E. Fox, and R. Garnett (Eds.), Vol. 32. Curran Associates, Inc. https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2019\/file\/09853c7fb1d3f8ee67a61b6bf4a7f8e6- Paper.pdf"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2011.5946540"},{"key":"e_1_2_1_32_1","volume-title":"RAGCache: Efficient Knowledge Caching for Retrieval-Augmented Generation. CoRR abs\/2404.12457","author":"Jin Chao","year":"2024","unstructured":"Chao Jin, Zili Zhang, Xuanlin Jiang, Fangyue Liu, Xin Liu, Xuanzhe Liu, and Xin Jin. 2024. RAGCache: Efficient Knowledge Caching for Retrieval-Augmented Generation. CoRR abs\/2404.12457 (2024). arXiv:2404.12457 doi:10.48550\/ ARXIV.2404.12457"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/TBDATA.2019.2921572"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2010.57"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.5555\/3495724.3496517"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/3447548.3467101"},{"key":"e_1_2_1_37_1","unstructured":"Yuan Lin and Vinod Grover. 2025. Using CUDA Warp-Level Primitives. https:\/\/developer.nvidia.com\/blog\/using-cudawarp-level-primitives\/."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/3651890.3672274"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.is.2013.10.006"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE60146.2024.00323"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3575693.3575748"},{"key":"e_1_2_1_43_1","unstructured":"Harsha Simhadri. 2024. Research talk: Approximate nearest neighbor search systems at scale. https:\/\/www.youtube. com\/watch?v=BnYNdSIKibQ&list=PLD7HFcN7LXReJTWFKYqwMcCc1nZKIXBo9&index=9."},{"key":"e_1_2_1_44_1","first-page":"177","article-title":"Results of the NeurIPS'21 challenge on billion-scale approximate nearest neighbor search. In NeurIPS 2021 Competitions and Demonstrations Track","author":"Simhadri Harsha Vardhan","year":"2022","unstructured":"Harsha Vardhan Simhadri, George Williams, Martin Aum\u00fcller, Matthijs Douze, Artem Babenko, Dmitry Baranchuk, Qi Chen, Lucas Hosseini, Ravishankar Krishnaswamny, Gopal Srinivasa, et al. 2022. Results of the NeurIPS'21 challenge on billion-scale approximate nearest neighbor search. In NeurIPS 2021 Competitions and Demonstrations Track. PMLR, 177-189.","journal-title":"PMLR"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-15286-3_16"},{"key":"e_1_2_1_46_1","first-page":"1135","volume-title":"Scalable Billion-point Approximate Nearest Neighbor Search Using SmartSSDs. In 2024 USENIX Annual Technical Conference (USENIX ATC 24)","author":"Tian Bing","year":"2024","unstructured":"Bing Tian, Haikun Liu, Zhuohui Duan, Xiaofei Liao, Hai Jin, and Yu Zhang. 2024. Scalable Billion-point Approximate Nearest Neighbor Search Using SmartSSDs. In 2024 USENIX Annual Technical Conference (USENIX ATC 24). USENIX Association, Santa Clara, CA, 1135-1150. https:\/\/www.usenix.org\/conference\/atc24\/presentation\/tian"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.5555\/3724648.3724659"},{"key":"e_1_2_1_48_1","volume-title":"Harsha Vardhan Simhadri, and Jyothi Vedurada","author":"Karthik","year":"2024","unstructured":"Karthik V., Saim Khan, Somesh Singh, Harsha Vardhan Simhadri, and Jyothi Vedurada. 2024. BANG: Billion-Scale Approximate Nearest Neighbor Search using a Single GPU. CoRR abs\/2401.11324 (2024). arXiv:2401.11324 doi:10. 48550\/ARXIV.2401.11324"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/3639269"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/3600006.3613166"},{"key":"e_1_2_1_51_1","unstructured":"Jintang Xue Yun-Cheng Wang Chengwei Wei and C. C. Jay Kuo. 2024. Word Embedding Dimension Reduction via Weakly-Supervised Feature Selection. arXiv:2407.12342 [cs.CL] https:\/\/arxiv.org\/abs\/2407.12342"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.226"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/3709661"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1145\/3357384.3357938"},{"key":"e_1_2_1_55_1","first-page":"377","volume-title":"17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 23)","author":"Zhang Qianxi","year":"2023","unstructured":"Qianxi Zhang, Shuotao Xu, Qi Chen, Guoxin Sui, Jiadong Xie, Zhizhen Cai, Yaoqi Chen, Yinxuan He, Yuqing Yang, Fan Yang, Mao Yang, and Lidong Zhou. 2023. VBASE: Unifying Online Vector Similarity Search and Relational Queries via Relaxed Monotonicity. In 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 23). USENIX Association, Boston, MA, 377-395. https:\/\/www.usenix.org\/conference\/osdi23\/presentation\/zhang-qianxi"},{"key":"e_1_2_1_56_1","first-page":"23","volume-title":"Fast Vector Query Processing for Large Datasets Beyond GPU Memory with Reordered Pipelining. In 21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24)","author":"Zhang Zili","year":"2024","unstructured":"Zili Zhang, Fangyue Liu, Gang Huang, Xuanzhe Liu, and Xin Jin. 2024. Fast Vector Query Processing for Large Datasets Beyond GPU Memory with Reordered Pipelining. In 21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24). USENIX Association, Santa Clara, CA, 23-40. https:\/\/www.usenix.org\/conference\/nsdi24\/presentation\/zhang-zili-pipelining"},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE48307.2020.00094"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1021\/ACS.JCIM.0C00393"}],"container-title":["Proceedings of the ACM on Management of Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3769799","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,6,13]],"date-time":"2026-06-13T04:47:42Z","timestamp":1781326062000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3769799"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,12,4]]},"references-count":58,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2025,12,4]]}},"alternative-id":["10.1145\/3769799"],"URL":"https:\/\/doi.org\/10.1145\/3769799","relation":{},"ISSN":["2836-6573"],"issn-type":[{"value":"2836-6573","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,12,4]]}}}