{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,27]],"date-time":"2026-03-27T00:05:54Z","timestamp":1774569954287,"version":"3.50.1"},"reference-count":47,"publisher":"Association for Computing Machinery (ACM)","issue":"8","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2023,4]]},"abstract":"<jats:p>The approximate nearest neighbor (ANN) search in high-dimensional spaces is a fundamental but computationally very expensive problem. Many methods have been designed for solving the ANN problem, such as LSH-based methods and graph-based methods. The LSH-based methods can be costly to reach high query quality due to the hash-boundary issues, while the graph-based methods can achieve better query performance by greedy expansion in an approximate proximity graph (APG). However, the construction cost of these APGs can be one or two orders of magnitude higher than that for building hash-based indexes. In addition, they fail short in incrementally maintaining APGs as the underlying dataset evolves. In this paper, we propose a novel approach named LSH-APG to build APGs and facilitate fast ANN search using a lightweight LSH framework. LSH-APG builds an APG via consecutively inserting points based on their nearest neighbor relationship with an efficient and accurate LSH-based search strategy. A high-quality entry point selection technique and an LSH-based pruning condition are developed to accelerate index construction and query processing by reducing the number of points to be accessed during the search. LSH-APG supports fast maintenance of APGs in lieu of building them from scratch as dataset evolves. Its maintenance cost and query cost for a point is proven to be less affected by dataset cardinality. Extensive experiments on real-world and synthetic datasets demonstrate that LSH-APG incurs significantly less construction cost but achieves better query performance than existing graph-based methods.<\/jats:p>","DOI":"10.14778\/3594512.3594527","type":"journal-article","created":{"date-parts":[[2023,6,23]],"date-time":"2023-06-23T00:28:36Z","timestamp":1687480116000},"page":"1979-1991","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":59,"title":["Towards Efficient Index Construction and Approximate Nearest Neighbor Search in High-Dimensional Spaces"],"prefix":"10.14778","volume":"16","author":[{"given":"Xi","family":"Zhao","sequence":"first","affiliation":[{"name":"Hong Kong University of Science and Technology"}]},{"given":"Yao","family":"Tian","sequence":"additional","affiliation":[{"name":"Hong Kong University of Science and Technology"}]},{"given":"Kai","family":"Huang","sequence":"additional","affiliation":[{"name":"Hong Kong University of Science and Technology"}]},{"given":"Bolong","family":"Zheng","sequence":"additional","affiliation":[{"name":"Huazhong University of Science and Technology"}]},{"given":"Xiaofang","family":"Zhou","sequence":"additional","affiliation":[{"name":"Hong Kong University of Science and Technology"}]}],"member":"320","published-online":{"date-parts":[[2023,6,22]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"crossref","unstructured":"Laurent Amsaleg Oussama Chelly Teddy Furon St\u00e9phane Girard Michael E. Houle Ken-ichi Kawarabayashi and Michael Nett. 2015. Estimating Local Intrinsic Dimensionality. In KDD. 29--38.  Laurent Amsaleg Oussama Chelly Teddy Furon St\u00e9phane Girard Michael E. Houle Ken-ichi Kawarabayashi and Michael Nett. 2015. Estimating Local Intrinsic Dimensionality. In KDD. 29--38.","DOI":"10.1145\/2783258.2783405"},{"key":"e_1_2_1_2_1","unstructured":"Alexandr Andoni and Piotr Indyk. 2016. LSH Algorithm and Implementation (E2LSH). https:\/\/www.mit.edu\/~andoni\/LSH  Alexandr Andoni and Piotr Indyk. 2016. LSH Algorithm and Implementation (E2LSH). https:\/\/www.mit.edu\/~andoni\/LSH"},{"key":"e_1_2_1_3_1","volume-title":"ANN-Benchmarks: A benchmarking tool for approximate nearest neighbor algorithms. Inf. Syst. 87","author":"Aum\u00fcller Martin","year":"2020","unstructured":"Martin Aum\u00fcller , Erik Bernhardsson , and Alexander John Faithfull . 2020. ANN-Benchmarks: A benchmarking tool for approximate nearest neighbor algorithms. Inf. Syst. 87 ( 2020 ). Martin Aum\u00fcller, Erik Bernhardsson, and Alexander John Faithfull. 2020. ANN-Benchmarks: A benchmarking tool for approximate nearest neighbor algorithms. Inf. Syst. 87 (2020)."},{"key":"e_1_2_1_4_1","doi-asserted-by":"crossref","first-page":"10","DOI":"10.1021\/acs.jcim.8b00524","article-title":"Polypharmacology Browser PPB2: Target Prediction Combining Nearest Neighbors with Machine Learning","volume":"59","author":"Awale Mahendra","year":"2019","unstructured":"Mahendra Awale and Jean-Louis Reymond . 2019 . Polypharmacology Browser PPB2: Target Prediction Combining Nearest Neighbors with Machine Learning . J. Chem. Inf. Model. 59 , 1 (2019), 10 -- 17 . Mahendra Awale and Jean-Louis Reymond. 2019. Polypharmacology Browser PPB2: Target Prediction Combining Nearest Neighbors with Machine Learning. J. Chem. Inf. Model. 59, 1 (2019), 10--17.","journal-title":"J. Chem. Inf. Model."},{"key":"e_1_2_1_5_1","volume-title":"SIGMOD Conference. ACM Press, 322--331","author":"Beckmann Norbert","year":"1990","unstructured":"Norbert Beckmann , Hans-Peter Kriegel , Ralf Schneider , and Bernhard Seeger . 1990 . The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles . In SIGMOD Conference. ACM Press, 322--331 . Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, and Bernhard Seeger. 1990. The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles. In SIGMOD Conference. ACM Press, 322--331."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/361002.361007"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/357775.357776"},{"key":"e_1_2_1_8_1","first-page":"1","article-title":"NN-Descent on High-Dimensional Data","volume":"20","author":"Bratic Brankica","year":"2018","unstructured":"Brankica Bratic , Michael E. Houle , Vladimir Kurbalija , Vincent Oria , and Milos Radovanovic . 2018 . NN-Descent on High-Dimensional Data . In WIMS. 20 : 1 -- 20 :8. Brankica Bratic, Michael E. Houle, Vladimir Kurbalija, Vincent Oria, and Milos Radovanovic. 2018. NN-Descent on High-Dimensional Data. In WIMS. 20:1--20:8.","journal-title":"WIMS."},{"key":"e_1_2_1_9_1","volume-title":"M-tree: An Efficient Access Method for Similarity Search in Metric Spaces. In VLDB. Morgan Kaufmann, 426--435.","author":"Ciaccia Paolo","year":"1997","unstructured":"Paolo Ciaccia , Marco Patella , and Pavel Zezula . 1997 . M-tree: An Efficient Access Method for Similarity Search in Metric Spaces. In VLDB. Morgan Kaufmann, 426--435. Paolo Ciaccia, Marco Patella, and Pavel Zezula. 1997. M-tree: An Efficient Access Method for Similarity Search in Metric Spaces. In VLDB. Morgan Kaufmann, 426--435."},{"key":"e_1_2_1_10_1","volume-title":"IEEE\/SP 13th Workshop on Statistical Signal Processing","author":"Costa Jose A","year":"2005","unstructured":"Jose A Costa , Abhishek Girotra , and AO Hero . 2005 . Estimating local intrinsic dimension with k-nearest neighbor graphs . In IEEE\/SP 13th Workshop on Statistical Signal Processing , 2005. IEEE, 417--422. Jose A Costa, Abhishek Girotra, and AO Hero. 2005. Estimating local intrinsic dimension with k-nearest neighbor graphs. In IEEE\/SP 13th Workshop on Statistical Signal Processing, 2005. IEEE, 417--422."},{"key":"e_1_2_1_11_1","doi-asserted-by":"crossref","first-page":"385","DOI":"10.1007\/978-3-540-24587-2_40","article-title":"On the Locality Properties of Space-Filling Curves","volume":"2906","author":"Dai H. K.","year":"2003","unstructured":"H. K. Dai and Hung-Chi Su . 2003 . On the Locality Properties of Space-Filling Curves . In ISAAC (Lecture Notes in Computer Science) , Vol. 2906. 385 -- 394 . H. K. Dai and Hung-Chi Su. 2003. On the Locality Properties of Space-Filling Curves. In ISAAC (Lecture Notes in Computer Science), Vol. 2906. 385--394.","journal-title":"ISAAC (Lecture Notes in Computer Science)"},{"key":"e_1_2_1_12_1","volume-title":"Symposium on Computational Geometry. 253--262","author":"Datar Mayur","unstructured":"Mayur Datar , Nicole Immorlica , Piotr Indyk , and Vahab S. Mirrokni . 2004. Locality-sensitive hashing scheme based on p-stable distributions . In Symposium on Computational Geometry. 253--262 . Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S. Mirrokni. 2004. Locality-sensitive hashing scheme based on p-stable distributions. In Symposium on Computational Geometry. 253--262."},{"key":"e_1_2_1_13_1","doi-asserted-by":"crossref","unstructured":"Wei Dong Moses Charikar and Kai Li. 2011. Efficient k-nearest neighbor graph construction for generic similarity measures. In WWW. 577--586.  Wei Dong Moses Charikar and Kai Li. 2011. Efficient k-nearest neighbor graph construction for generic similarity measures. In WWW. 577--586.","DOI":"10.1145\/1963405.1963487"},{"key":"e_1_2_1_14_1","article-title":"Fast Distributed kNN Graph Construction Using Auto-tuned Locality-sensitive Hashing","volume":"11","author":"Eiras-Franco Carlos","year":"2020","unstructured":"Carlos Eiras-Franco , David Mart\u00ednez-Rego , Leslie Kanthan , C\u00e9sar Pi\u00f1eiro , Antonio Bahamonde , Bertha Guijarro-Berdi\u00f1as , and Amparo Alonso-Betanzos . 2020 . Fast Distributed kNN Graph Construction Using Auto-tuned Locality-sensitive Hashing . ACM Trans. Intell. Syst. Technol. 11 , 6 (2020), 71:1--71:18. Carlos Eiras-Franco, David Mart\u00ednez-Rego, Leslie Kanthan, C\u00e9sar Pi\u00f1eiro, Antonio Bahamonde, Bertha Guijarro-Berdi\u00f1as, and Amparo Alonso-Betanzos. 2020. Fast Distributed kNN Graph Construction Using Auto-tuned Locality-sensitive Hashing. ACM Trans. Intell. Syst. Technol. 11, 6 (2020), 71:1--71:18.","journal-title":"ACM Trans. Intell. Syst. Technol."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0167-8655(01)00017-4"},{"key":"e_1_2_1_16_1","volume-title":"EFANNA : An Extremely Fast Approximate Nearest Neighbor Search Algorithm Based on kNN Graph. CoRR abs\/1609.07228","author":"Fu Cong","year":"2016","unstructured":"Cong Fu and Deng Cai . 2016 . EFANNA : An Extremely Fast Approximate Nearest Neighbor Search Algorithm Based on kNN Graph. CoRR abs\/1609.07228 (2016). Cong Fu and Deng Cai. 2016. EFANNA : An Extremely Fast Approximate Nearest Neighbor Search Algorithm Based on kNN Graph. CoRR abs\/1609.07228 (2016)."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.14778\/3303753.3303754"},{"key":"e_1_2_1_18_1","unstructured":"Junhao Gan Jianlin Feng Qiong Fang and Wilfred Ng. 2012. Locality-sensitive hashing scheme based on dynamic collision counting. In SIGMOD. 541--552.  Junhao Gan Jianlin Feng Qiong Fang and Wilfred Ng. 2012. Locality-sensitive hashing scheme based on dynamic collision counting. In SIGMOD. 541--552."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2013.240"},{"key":"e_1_2_1_20_1","unstructured":"Aristides Gionis Piotr Indyk and Rajeev Motwani. 1999. Similarity Search in High Dimensions via Hashing. In VLDB. 518--529.  Aristides Gionis Piotr Indyk and Rajeev Motwani. 1999. Similarity Search in High Dimensions via Hashing. In VLDB. 518--529."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/18.720541"},{"key":"e_1_2_1_22_1","volume-title":"R-Trees: A Dynamic Index Structure for Spatial Searching. In SIGMOD Conference. 47--57","author":"Guttman Antonin","year":"1984","unstructured":"Antonin Guttman . 1984 . R-Trees: A Dynamic Index Structure for Spatial Searching. In SIGMOD Conference. 47--57 . Antonin Guttman. 1984. R-Trees: A Dynamic Index Structure for Spatial Searching. In SIGMOD Conference. 47--57."},{"key":"e_1_2_1_23_1","volume-title":"FANNG: Fast Approximate Nearest Neighbour Graphs","author":"Harwood Ben","year":"2016","unstructured":"Ben Harwood and Tom Drummond . 2016 . FANNG: Fast Approximate Nearest Neighbour Graphs . In CVPR. IEEE Computer Society , 5713--5722. Ben Harwood and Tom Drummond. 2016. FANNG: Fast Approximate Nearest Neighbour Graphs. In CVPR. IEEE Computer Society, 5713--5722."},{"key":"e_1_2_1_24_1","volume-title":"MIDAS: Towards Efficient and Effective Maintenance of Canned Patterns in Visual Graph Query Interfaces. In SIGMOD Conference. 764--776","author":"Huang Kai","year":"2021","unstructured":"Kai Huang , Huey-Eng Chua , Sourav S. Bhowmick , Byron Choi , and Shuigeng Zhou . 2021 . MIDAS: Towards Efficient and Effective Maintenance of Canned Patterns in Visual Graph Query Interfaces. In SIGMOD Conference. 764--776 . Kai Huang, Huey-Eng Chua, Sourav S. Bhowmick, Byron Choi, and Shuigeng Zhou. 2021. MIDAS: Towards Efficient and Effective Maintenance of Canned Patterns in Visual Graph Query Interfaces. In SIGMOD Conference. 764--776."},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.14778\/2850469.2850470"},{"key":"e_1_2_1_26_1","doi-asserted-by":"crossref","unstructured":"Piotr Indyk and Rajeev Motwani. 1998. Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality. In STOC. 604--613.  Piotr Indyk and Rajeev Motwani. 1998. Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality. In STOC. 604--613.","DOI":"10.1145\/276698.276876"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2010.57"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2019.2909204"},{"key":"e_1_2_1_29_1","volume-title":"On the Merge of k-NN Graph. CoRR abs\/1908.00814","author":"Lin Peng-Cheng","year":"2019","unstructured":"Peng-Cheng Lin and Wan-Lei Zhao . 2019. On the Merge of k-NN Graph. CoRR abs\/1908.00814 ( 2019 ). Peng-Cheng Lin and Wan-Lei Zhao. 2019. On the Merge of k-NN Graph. CoRR abs\/1908.00814 (2019)."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.14778\/2732939.2732947"},{"key":"e_1_2_1_31_1","volume-title":"R2LSH: A Nearest Neighbor Search Scheme Based on Two-dimensional Projected Spaces","author":"Lu Kejing","unstructured":"Kejing Lu and Mineichi Kudo . 2020. R2LSH: A Nearest Neighbor Search Scheme Based on Two-dimensional Projected Spaces . In ICDE. IEEE , 1045--1056. Kejing Lu and Mineichi Kudo. 2020. R2LSH: A Nearest Neighbor Search Scheme Based on Two-dimensional Projected Spaces. In ICDE. IEEE, 1045--1056."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.14778\/3397230.3397240"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.is.2013.10.006"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2018.2889473"},{"key":"e_1_2_1_35_1","volume-title":"Zanoni Dias, and Ricardo da Silva Torres.","author":"Vargas Mu\u00f1oz Javier Alvaro","year":"2019","unstructured":"Javier Alvaro Vargas Mu\u00f1oz , Marcos Andr\u00e9 Gon\u00e7alves , Zanoni Dias, and Ricardo da Silva Torres. 2019 . Hierarchical Clustering-Based Graphs for Large Scale Approximate Nearest Neighbor Search. Pattern Recognit . 96 (2019). Javier Alvaro Vargas Mu\u00f1oz, Marcos Andr\u00e9 Gon\u00e7alves, Zanoni Dias, and Ricardo da Silva Torres. 2019. Hierarchical Clustering-Based Graphs for Large Scale Approximate Nearest Neighbor Search. Pattern Recognit. 96 (2019)."},{"key":"e_1_2_1_36_1","volume-title":"BTW (Informatik-Fachberichte)","author":"Ooi Beng Chin","unstructured":"Beng Chin Ooi . 1987. Spatial kd-Tree: A Data Structure for Geographic Database . In BTW (Informatik-Fachberichte) , Vol. 136 . Springer , 247--258. Beng Chin Ooi. 1987. Spatial kd-Tree: A Data Structure for Geographic Database. In BTW (Informatik-Fachberichte), Vol. 136. Springer, 247--258."},{"key":"e_1_2_1_37_1","volume-title":"ICML (Proceedings of Machine Learning Research)","volume":"119","author":"Prokhorenkova Liudmila","year":"2020","unstructured":"Liudmila Prokhorenkova and Aleksandr Shekhovtsov . 2020 . Graph-based Nearest Neighbor Search: From Practice to Theory . In ICML (Proceedings of Machine Learning Research) , Vol. 119 . PMLR, 7803--7813. Liudmila Prokhorenkova and Aleksandr Shekhovtsov. 2020. Graph-based Nearest Neighbor Search: From Practice to Theory. In ICML (Proceedings of Machine Learning Research), Vol. 119. PMLR, 7803--7813."},{"key":"e_1_2_1_38_1","unstructured":"Technical Report. 2023. Available at:. https:\/\/github.com\/Jacyhust\/LSH-APG\/tree\/main\/Report  Technical Report. 2023. Available at:. https:\/\/github.com\/Jacyhust\/LSH-APG\/tree\/main\/Report"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/MCI.2015.2437512"},{"key":"e_1_2_1_40_1","doi-asserted-by":"crossref","unstructured":"Yufei Tao Ke Yi Cheng Sheng and Panos Kalnis. 2009. Quality and efficiency in high dimensional nearest neighbor search. In SIGMOD. 563--576.  Yufei Tao Ke Yi Cheng Sheng and Panos Kalnis. 2009. Quality and efficiency in high dimensional nearest neighbor search. In SIGMOD. 563--576.","DOI":"10.1145\/1559845.1559905"},{"key":"e_1_2_1_41_1","volume-title":"Information Retrieval Based Nearest Neighbor Classification for Fine-Grained Bug Severity Prediction","author":"Tian Yuan","unstructured":"Yuan Tian , David Lo , and Chengnian Sun . 2012. Information Retrieval Based Nearest Neighbor Classification for Fine-Grained Bug Severity Prediction . In WCRE. IEEE Computer Society , 215--224. Yuan Tian, David Lo, and Chengnian Sun. 2012. Information Retrieval Based Nearest Neighbor Classification for Fine-Grained Bug Severity Prediction. In WCRE. IEEE Computer Society, 215--224."},{"key":"e_1_2_1_42_1","doi-asserted-by":"crossref","unstructured":"Yao Tian Xi Zhao and Xiaofang Zhou. 2022. DB-LSH: Locality-Sensitive Hashing with Query-based Dynamic Bucketing. In ICDE. 2250--2262.  Yao Tian Xi Zhao and Xiaofang Zhou. 2022. DB-LSH: Locality-Sensitive Hashing with Query-based Dynamic Bucketing. In ICDE. 2250--2262.","DOI":"10.1109\/ICDE53745.2022.00214"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.14778\/3476249.3476255"},{"key":"e_1_2_1_44_1","unstructured":"Roger Weber Hans-J\u00f6rg Schek and Stephen Blott. 1998. A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces. In VLDB. Morgan Kaufmann 194--205.  Roger Weber Hans-J\u00f6rg Schek and Stephen Blott. 1998. A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces. In VLDB. Morgan Kaufmann 194--205."},{"key":"e_1_2_1_45_1","volume-title":"Data Structures and Algorithms for Nearest Neighbor Search in General Metric Spaces","author":"Yianilos Peter N.","unstructured":"Peter N. Yianilos . 1993. Data Structures and Algorithms for Nearest Neighbor Search in General Metric Spaces . In SODA. ACM\/SIAM , 311--321. Peter N. Yianilos. 1993. Data Structures and Algorithms for Nearest Neighbor Search in General Metric Spaces. In SODA. ACM\/SIAM, 311--321."},{"key":"e_1_2_1_46_1","volume-title":"k-NN Graph Construction: a Generic Online Approach. CoRR abs\/1804.03032","author":"Zhao Wan-Lei","year":"2018","unstructured":"Wan-Lei Zhao . 2018. k-NN Graph Construction: a Generic Online Approach. CoRR abs\/1804.03032 ( 2018 ). Wan-Lei Zhao. 2018. k-NN Graph Construction: a Generic Online Approach. CoRR abs\/1804.03032 (2018)."},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.14778\/3377369.3377374"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3594512.3594527","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,6,23]],"date-time":"2023-06-23T00:32:54Z","timestamp":1687480374000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3594512.3594527"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,4]]},"references-count":47,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2023,4]]}},"alternative-id":["10.14778\/3594512.3594527"],"URL":"https:\/\/doi.org\/10.14778\/3594512.3594527","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2023,4]]},"assertion":[{"value":"2023-06-22","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}