{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,14]],"date-time":"2026-03-14T09:50:40Z","timestamp":1773481840853,"version":"3.50.1"},"reference-count":63,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2024,5,29]],"date-time":"2024-05-29T00:00:00Z","timestamp":1716940800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Manag. Data"],"published-print":{"date-parts":[[2024,5,29]]},"abstract":"<jats:p>Similarity search, the task of identifying objects most similar to a given query object under a specific metric, has gathered significant attention due to its practical applications. However, the absence of coordinate information to accelerate similarity search and the high computational cost of measuring object similarity hinder the efficiency of existing CPU-based methods. Additionally, these methods struggle to meet the demand for high throughput data management. To address these challenges, we propose GTS, a GPU-based tree index designed for the parallel processing of similarity search in general metric spaces, where only the distance metric for measuring object similarity is known. The GTS index utilizes a pivot-based tree structure to efficiently prune objects and employs list tables to facilitate GPU computing. To efficiently manage concurrent similarity queries with limited GPU memory, we have developed a two-stage search method that combines batch processing and sequential strategies to optimize memory usage. The paper also introduces an effective update strategy for the proposed GPU-based index, encompassing streaming data updates and batch data updates. Additionally, we present a cost model to evaluate search performance. Extensive experiments on five real-life datasets demonstrate that GTS achieves efficiency gains of up to two orders of magnitude over existing CPU baselines and up to 20x efficiency improvements compared to state-of-the-art GPU-based methods.<\/jats:p>","DOI":"10.1145\/3654945","type":"journal-article","created":{"date-parts":[[2024,5,30]],"date-time":"2024-05-30T09:44:53Z","timestamp":1717062293000},"page":"1-27","source":"Crossref","is-referenced-by-count":7,"title":["GTS: GPU-based Tree Index for Fast Similarity Search"],"prefix":"10.1145","volume":"2","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4555-6232","authenticated-orcid":false,"given":"Yifan","family":"Zhu","sequence":"first","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-8558-3619","authenticated-orcid":false,"given":"Ruiyao","family":"Ma","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9792-9171","authenticated-orcid":false,"given":"Baihua","family":"Zheng","sequence":"additional","affiliation":[{"name":"Singapore Management University, Singapore, Singapore"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8082-7398","authenticated-orcid":false,"given":"Xiangyu","family":"Ke","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5685-7017","authenticated-orcid":false,"given":"Lu","family":"Chen","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3816-8450","authenticated-orcid":false,"given":"Yunjun","family":"Gao","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}]}],"member":"320","published-online":{"date-parts":[[2024,5,30]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"2013. Word to vectors. https:\/\/code.google.com\/archive\/p\/word2vec"},{"key":"e_1_2_1_2_1","unstructured":"2023. Moby project. https:\/\/mobyproject.org"},{"key":"e_1_2_1_3_1","unstructured":"2023. National Library of Medicine. http:\/\/www.ncbi.nlm.nih.gov\/genome"},{"key":"e_1_2_1_4_1","unstructured":"2023. Nvidia GPU introduction. https:\/\/www.nvidia.com\/en-us\/geforce\/graphics-cards\/40-series\/rtx-4090"},{"key":"e_1_2_1_5_1","unstructured":"2023. The source code of GTS. https:\/\/github.com\/ZJU-DAILY\/GTS\/"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11227-021-03975-2"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1177\/1550147718811827"},{"key":"e_1_2_1_8_1","volume-title":"Proceedings of the SecondWorkshop on Very Large Digital Libraries.","author":"Bolettieri Paolo","year":"2009","unstructured":"Paolo Bolettieri, Andrea Esuli, Fabrizio Falchi, Claudio Lucchese, Raffaele Perego, and Fausto Rabitti. 2009. Enabling content-based image retrieval in very large digital libraries. In Proceedings of the SecondWorkshop on Very Large Digital Libraries."},{"key":"e_1_2_1_9_1","doi-asserted-by":"crossref","unstructured":"Tolga Bozkaya and Z. Meral \u00d6zsoyoglu. 1997. Distance-Based Indexing for High-Dimensional Metric Spaces. In SIGMOD. 357--368.","DOI":"10.1145\/253262.253345"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/328939.328959"},{"key":"e_1_2_1_11_1","unstructured":"Sergey Brin. 1995. Near Neighbor Search in Large Metric Spaces. In VLDB Umeshwar Dayal Peter M. D. Gray and Shojiro Nishio (Eds.). 574--584."},{"key":"e_1_2_1_12_1","unstructured":"Carrie J. Cai Emily Reif Narayan Hegde Jason D. Hipp Been Kim Daniel Smilkov Martin Wattenberg Fernanda B. Vi\u00e9gas Gregory S. Corrado Martin C. Stumpe and Michael Terry. 2019. Human-Centered Tools for Coping with Imperfect Algorithms During Medical Decision-Making. In CHI. 4."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/SPIRE.2000.878182"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patrec.2004.11.014"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/502807.502808"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2015.2506556"},{"key":"e_1_2_1_17_1","volume-title":"Jensen","author":"Chen Lu","year":"2023","unstructured":"Lu Chen, Yunjun Gao, Xuan Song, Zheng Li, Yifan Zhu, Xiaoye Miao, and Christian S. Jensen. 2023. Indexing Metric Spaces for Exact Similarity Search. Comput. Surveys 55, 6 (2023), 128:1--128:39."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.14778\/3115404.3115411"},{"key":"e_1_2_1_19_1","volume-title":"M-tree: An Efficient Access Method for Similarity Search in Metric Spaces. In VLDB. 426--435.","author":"Ciaccia Paolo","year":"1997","unstructured":"Paolo Ciaccia, Marco Patella, and Pavel Zezula. 1997. M-tree: An Efficient Access Method for Similarity Search in Metric Spaces. In VLDB. 426--435."},{"key":"e_1_2_1_20_1","first-page":"1","article-title":"A coordinate-oblivious index for high-dimensional distance similarity searches on the GPU","volume":"8","author":"Donnelly Brian","year":"2020","unstructured":"Brian Donnelly and Michael Gowanlock. 2020. A coordinate-oblivious index for high-dimensional distance similarity searches on the GPU. In ICS. 8:1--8:12.","journal-title":"ICS."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0022-0000(03)00026-6"},{"key":"e_1_2_1_22_1","doi-asserted-by":"crossref","unstructured":"Maximilian Franzke Tobias Emrich Andreas Z\u00fcfle and Matthias Renz. 2016. Indexing multi-metric data. In ICDE. 1122--1133.","DOI":"10.1109\/ICDE.2016.7498318"},{"key":"e_1_2_1_23_1","volume-title":"Caiwen Ding, Xiaoye S. Li, and Hang Liu.","author":"Gaihre Anil","year":"2021","unstructured":"Anil Gaihre, Da Zheng, Scott Weitze, Lingda Li, Shuaiwen Leon Song, Caiwen Ding, Xiaoye S. Li, and Hang Liu. 2021. Dr. Top-k: delegate-centric Top-k on GPUs. In SC. 39."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.14778\/3282495.3282497"},{"key":"e_1_2_1_25_1","doi-asserted-by":"crossref","unstructured":"Gaurav Gupta Minghao Yan Benjamin Coleman Bryce Kille Ryan A. Leo Elworth Tharun Medini Todd J. Treangen and Anshumali Shrivastava. 2021. Fast Processing and Querying of 170TB of Genomics Data via a Repeated And Merged BloOm Filter (RAMBO). In SIGMOD. 2226--2234.","DOI":"10.1145\/3448016.3457333"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/602259.602266"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1287\/moor.10.2.180"},{"key":"e_1_2_1_28_1","doi-asserted-by":"crossref","unstructured":"Linjia Hu Saeid Nooshabadi and Majid Ahmadi. 2016. Parallel randomized KD-tree forest on GPU cluster for image descriptor matching. In ISCAS. 582--585.","DOI":"10.1109\/ISCAS.2016.7527307"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1038\/s41568-022-00502-0"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/TBDATA.2019.2921572"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-005-0178-0"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.1983.235263"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jpdc.2013.03.015"},{"key":"e_1_2_1_34_1","doi-asserted-by":"crossref","unstructured":"Juhwan Kim Jongseon Seo Jonghyeok Park Sang-Won Lee Hongchan Roh and Hyungmin Cho. 2022. ES4D: Accelerating Exact Similarity Search for High-Dimensional Vectors via Vector Slicing and In-SSD Computation. In ICCD. 298--306.","DOI":"10.1109\/ICCD56317.2022.00051"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2018.2823760"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSC.2021.3079580"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.14778\/3494124.3494151"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2020.2992440"},{"key":"e_1_2_1_39_1","doi-asserted-by":"crossref","unstructured":"Chuanwen Li Yu Gu Jianzhong Qi Jiayuan He Qingxu Deng and Ge Yu. 2018. A GPU Accelerated Update Efficient Index for kNN Queries in Road Networks. In ICDE. 881--892.","DOI":"10.1109\/ICDE.2018.00084"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-019-00555-y"},{"key":"e_1_2_1_41_1","doi-asserted-by":"crossref","unstructured":"Lijuan Luo Martin D. F. Wong and Lance Leong. 2012. Parallel implementation of R-trees on the GPU. In ASP-DAC. 353--358.","DOI":"10.1109\/ASPDAC.2012.6164973"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3588942"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jda.2011.10.004"},{"key":"e_1_2_1_44_1","volume-title":"Barrientos","author":"Mar\u00edn Mauricio","year":"2007","unstructured":"Mauricio Mar\u00edn, Roberto Uribe, and Ricardo J. Barrientos. 2007. Searching and Updating Metric Space Databases Using the Parallel EGNAT. In ICCS. 229--236."},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1016\/0167-8655(94)90095-7"},{"key":"e_1_2_1_46_1","doi-asserted-by":"crossref","unstructured":"Juraj Mosko Jakub Lokoc and Tom\u00e1s Skopal. 2011. Clustered pivot tables for I\/O-optimized similarity search. In SISAP. 17--24.","DOI":"10.1145\/1995412.1995418"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jocs.2011.01.006"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.is.2011.01.002"},{"key":"e_1_2_1_49_1","doi-asserted-by":"crossref","unstructured":"Guillermo Ruiz Francisco Santoyo Edgar Ch\u00e1vez Karina Figueroa and Eric Sadit Tellez. 2013. Extreme Pivots for Faster Metric Indexes. In SISAP. 115--126.","DOI":"10.1007\/978-3-642-41062-8_12"},{"key":"e_1_2_1_50_1","doi-asserted-by":"crossref","unstructured":"Amirhesam Shahvarani and Hans-Arno Jacobsen. 2016. A Hybrid B-tree as Solution for In-Memory Indexing on CPU-GPU Heterogeneous Computing Platforms. In SIGMOD. 1523--1538.","DOI":"10.1145\/2882903.2882918"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.is.2020.101507"},{"key":"e_1_2_1_52_1","doi-asserted-by":"crossref","unstructured":"Dejun Teng Akshay Nehe Prajeeth Emanuel Furqan Baig Jun Kong and Fusheng Wang. 2021. GPU-based Real-time Contact Tracing at Scale. In SIGSPATIAL. 1--10.","DOI":"10.1145\/3474717.3483627"},{"key":"e_1_2_1_53_1","volume-title":"Promiscuous molecules for smarter file operations in DNA-based data storage. Nature communications 12, 1","author":"Tomek Kyle J","year":"2021","unstructured":"Kyle J Tomek, Kevin Volkel, Elaine W Indermaur, James M Tuck, and Albert J Keung. 2021. Promiscuous molecules for smarter file operations in DNA-based data storage. Nature communications 12, 1 (2021), 3518."},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.14778\/3476249.3476255"},{"key":"e_1_2_1_55_1","first-page":"3225","article-title":"Effective Similarity Search on Heterogeneous Networks: A Meta-Path Free Approach","volume":"34","author":"Wang Yue","year":"2022","unstructured":"Yue Wang, Zhe Wang, Ziyuan Zhao, Zijian Li, Xun Jian, Hao Xin, Lei Chen, Jianchun Song, Zhenhong Chen, and Meng Zhao. 2022. Effective Similarity Search on Heterogeneous Networks: A Meta-Path Free Approach. IEEE Trans. Knowl. Data Eng. 34, 7 (2022), 3225--3240.","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"e_1_2_1_56_1","doi-asserted-by":"crossref","unstructured":"Martin Winter Mathias Parger Daniel Mlakar and Markus Steinberger. 2021. Are dynamic memory managers on GPUs slow?: a survey and benchmarks. In PPoPP. 219--233.","DOI":"10.1145\/3437801.3441612"},{"key":"e_1_2_1_57_1","doi-asserted-by":"crossref","unstructured":"Simin You Jianting Zhang and Le Gruenwald. 2013. Parallel spatial query processing on GPUs using R-trees. In SIGSPATIAL. 23--31.","DOI":"10.1145\/2534921.2534949"},{"key":"e_1_2_1_58_1","doi-asserted-by":"crossref","unstructured":"Yuanhang Yu Dong Wen Ying Zhang Lu Qin Wenjie Zhang and Xuemin Lin. 2022. GPU-accelerated Proximity Graph Approximate Nearest Neighbor Search and Construction. In ICDE. 552--564.","DOI":"10.1109\/ICDE53745.2022.00046"},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1145\/3588715"},{"key":"e_1_2_1_60_1","volume-title":"SONG: Approximate Nearest Neighbor Search on GPU. In ICDE. 1033--1044.","author":"Zhao Weijie","year":"2020","unstructured":"Weijie Zhao, Shulong Tan, and Ping Li. 2020. SONG: Approximate Nearest Neighbor Search on GPU. In ICDE. 1033--1044."},{"key":"e_1_2_1_61_1","doi-asserted-by":"crossref","unstructured":"Jingbo Zhou Qi Guo H. V. Jagadish Lubos Krc\u00e1l Siyuan Liu Wenhao Luan Anthony K. H. Tung Yueji Yang and Yuxin Zheng. 2018. A Generic Inverted Index Framework for Similarity Search on the GPU. In ICDE. 893--904.","DOI":"10.1109\/ICDE.2018.00085"},{"key":"e_1_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-021-00691-4"},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.14778\/3547305.3547317"}],"container-title":["Proceedings of the ACM on Management of Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3654945","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3654945","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,21]],"date-time":"2025-08-21T14:38:40Z","timestamp":1755787120000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3654945"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,5,29]]},"references-count":63,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2024,5,29]]}},"alternative-id":["10.1145\/3654945"],"URL":"https:\/\/doi.org\/10.1145\/3654945","relation":{},"ISSN":["2836-6573"],"issn-type":[{"value":"2836-6573","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,5,29]]}}}