{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T01:47:25Z","timestamp":1760060845715,"version":"build-2065373602"},"reference-count":120,"publisher":"MDPI AG","issue":"10","license":[{"start":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T00:00:00Z","timestamp":1759968000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["BDCC"],"abstract":"<jats:p>In this study, we propose a novel adaptive algorithm for approximate nearest neighbor (ANN) search, based on the inverted file (IVF) index (cluster-based index) and online query complexity classification. The concept of the classical IVF search implemented in vector databases is as follows: all data vectors are divided into clusters, and each cluster is assigned to its central point (centroid). For an ANN search query, the closest centroids are determined, and the further search continues in the corresponding clusters only. In our study, the complexity of each query is assessed and classified with the use of results of an initial trial search in a limited number of clusters. Based on this classification, the algorithm dynamically determines the presumably sufficient number of clusters which is sufficient to achieve the desired Recall value, thereby improving vector search efficiency. Our experiments show that such a complexity classifier can be built with the use of a single feature, and we propose an algorithm for its training. We studied the impact of various features on the query processing and discovered a strong dependence on the number of clusters that contains at least one nearest neighbor (productive clusters). The new algorithm is designed to be implemented on top of the IVF search which is a well-known algorithm for approximate nearest neighbor search and uses existing IVF indexes that are widely used in the most popular vector database management systems, such as pgvector. The results obtained demonstrate a significant increase in the speed of nearest neighbor search (up to 35%) while maintaining a high Recall rate of 0.99. Additionally, the search algorithm is deterministic, which might be extremely important for tasks where the reproducibility of results plays a crucial role. The developed algorithm has been tested on datasets of varying sizes up to one billion data vectors.<\/jats:p>","DOI":"10.3390\/bdcc9100254","type":"journal-article","created":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T16:54:31Z","timestamp":1760028871000},"page":"254","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Fast Adaptive Approximate Nearest Neighbor Search with Cluster-Shaped Indices"],"prefix":"10.3390","volume":"9","author":[{"given":"Vladimir","family":"Kazakovtsev","sequence":"first","affiliation":[{"name":"Institute of Informatics and Telecommunications, Reshetnev Siberian State University of Science and Technology, Prosp. Krasnoyarskiy Rabochiy 31, Krasnoyarsk 660037, Russia"},{"name":"Laboratory \u201cHybrid Methods of Modelling and Optimization in Complex Systems\u201d, Siberian Federal University, ul. Kirenskogo 26A, Krasnoyarsk 660074, Russia"}]},{"given":"Mikhail","family":"Plekhanov","sequence":"additional","affiliation":[{"name":"Department of Information Technologies, Novosibirsk State University, ul. Pirogova 1, Novosibirsk 630090, Russia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7683-0751","authenticated-orcid":false,"given":"Alexandr","family":"Naumchev","sequence":"additional","affiliation":[{"name":"Department of Information Technologies, Novosibirsk State University, ul. Pirogova 1, Novosibirsk 630090, Russia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8257-7329","authenticated-orcid":false,"given":"Guzel","family":"Shkaberina","sequence":"additional","affiliation":[{"name":"Institute of Informatics and Telecommunications, Reshetnev Siberian State University of Science and Technology, Prosp. Krasnoyarskiy Rabochiy 31, Krasnoyarsk 660037, Russia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3997-342X","authenticated-orcid":false,"given":"Igor","family":"Masich","sequence":"additional","affiliation":[{"name":"Institute of Informatics and Telecommunications, Reshetnev Siberian State University of Science and Technology, Prosp. Krasnoyarskiy Rabochiy 31, Krasnoyarsk 660037, Russia"},{"name":"Laboratory \u201cHybrid Methods of Modelling and Optimization in Complex Systems\u201d, Siberian Federal University, ul. Kirenskogo 26A, Krasnoyarsk 660074, Russia"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-0164-6200","authenticated-orcid":false,"given":"Lyudmila","family":"Egorova","sequence":"additional","affiliation":[{"name":"Institute of Informatics and Telecommunications, Reshetnev Siberian State University of Science and Technology, Prosp. Krasnoyarskiy Rabochiy 31, Krasnoyarsk 660037, Russia"},{"name":"Laboratory \u201cHybrid Methods of Modelling and Optimization in Complex Systems\u201d, Siberian Federal University, ul. Kirenskogo 26A, Krasnoyarsk 660074, Russia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5564-9267","authenticated-orcid":false,"given":"Alena","family":"Stupina","sequence":"additional","affiliation":[{"name":"Institute of Informatics and Telecommunications, Reshetnev Siberian State University of Science and Technology, Prosp. Krasnoyarskiy Rabochiy 31, Krasnoyarsk 660037, Russia"},{"name":"Laboratory \u201cHybrid Methods of Modelling and Optimization in Complex Systems\u201d, Siberian Federal University, ul. Kirenskogo 26A, Krasnoyarsk 660074, Russia"}]},{"given":"Aleksey","family":"Popov","sequence":"additional","affiliation":[{"name":"Institute of Informatics and Telecommunications, Reshetnev Siberian State University of Science and Technology, Prosp. Krasnoyarskiy Rabochiy 31, Krasnoyarsk 660037, Russia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0667-4001","authenticated-orcid":false,"given":"Lev","family":"Kazakovtsev","sequence":"additional","affiliation":[{"name":"Institute of Informatics and Telecommunications, Reshetnev Siberian State University of Science and Technology, Prosp. Krasnoyarskiy Rabochiy 31, Krasnoyarsk 660037, Russia"},{"name":"Laboratory \u201cHybrid Methods of Modelling and Optimization in Complex Systems\u201d, Siberian Federal University, ul. Kirenskogo 26A, Krasnoyarsk 660074, Russia"}]}],"member":"1968","published-online":{"date-parts":[[2025,10,9]]},"reference":[{"key":"ref_1","first-page":"39","article-title":"A Survey on Nearest Neighbor Search Methods","volume":"95","author":"Abbasifard","year":"2014","journal-title":"Int. J. Comput. Appl."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"20","DOI":"10.1007\/BF02834632","article-title":"Mahalanobis Distance","volume":"4","author":"McLachlan","year":"1999","journal-title":"Resonance"},{"key":"ref_3","unstructured":"Ponomarenko, A., Malkov, Y., Logvinov, A., and Krylov, V. (December, January 29). Approximate Nearest Neighbor Search Small World Approach. Proceedings of the International Conference on Information and Communication Technologies and Applications, Orlando, FL, USA."},{"key":"ref_4","first-page":"1","article-title":"Efficient approximate nearest neighbor search in multi-dimensional databases","volume":"1","author":"Peng","year":"2023","journal-title":"Proc. ACM Manag. Data"},{"key":"ref_5","unstructured":"Bhatia, N., and Ashev, V. (2010). Survey of Nearest Neighbor Techniques. arXiv."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3459665","article-title":"K-nearest neighbour classifiers-a tutorial","volume":"54","author":"Cunningham","year":"2021","journal-title":"ACM Comput. Surv."},{"key":"ref_7","unstructured":"Hwang, Y., Han, B., and Ahn, H.-K. (2012, January 16\u201321). A Fast Nearest Neighbor Search Algorithm by Nonlinear Embedding. Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, USA."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Kaminska, O., Cornelis, C., and Hoste, V. (2023). Fuzzy rough nearest neighbour methods for aspect-based sentiment analysis. Electronics, 12.","DOI":"10.3390\/electronics12051088"},{"key":"ref_9","unstructured":"Weber, R., Schek, H.-J., and Blott, S. (1998, January 24\u201327). A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces. Proceedings of the 24th International Conference on Very Large Data Bases (VLDB), New York, NY, USA."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"965","DOI":"10.1016\/j.patrec.2007.01.001","article-title":"A Method for Initialising the K-means Clustering Algorithm Using kd-trees","volume":"28","author":"Heneghan","year":"2007","journal-title":"Pattern Recognit. Lett."},{"key":"ref_11","first-page":"149","article-title":"Nearest Neighbor Search by Using Partial KD-tree Method","volume":"20","author":"Kraus","year":"2008","journal-title":"Theor. Appl. Genet."},{"key":"ref_12","unstructured":"Narasimhulu, Y., Suthar, A., Pasunuri, R., and Vadlamudi, C.V. (2021, January 25\u201327). CKD-Tree: An Improved KD-Tree Construction Algorithm. Proceedings of the International Semantic Intelligence Conference 2021 (ISIC 2021), New Delhi, India."},{"key":"ref_13","unstructured":"Yen, S.-H., Shih, C.-Y., Chang, H.-W., and Li, T.-K. (2010, January 20\u201322). Nearest Neighbor Searching in High Dimensions Using Multiple KD-trees. Proceedings of the 10th WSEAS International Conference on Signal Processing, Computational Geometry and Artificial Vision, Taipei, Taiwan."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Sun, Y., Zhao, T., Yoon, S., and Lee, Y. (2021). A Hybrid Approach Combining R*-Tree and k-d Trees to Improve Linked Open Data Query Performance. Appl. Sci., 11.","DOI":"10.3390\/app11052405"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Guttman, A. (1984, January 18\u201321). R-Trees: A Dynamic Index Structure for Spatial Searching. Proceedings of the 1984 ACM SIGMOD International Conference on Management of Data, Boston, MA, USA.","DOI":"10.1145\/602264.602266"},{"key":"ref_16","unstructured":"Papadopoulos, A., and Manolopoulos, Y. (1997, January 8\u201310). Performance of Nearest Neighbor Queries in R-Trees. Proceedings of the 6th International Conference on Database Theory (ICDT\u201997), Delphi, Greece."},{"key":"ref_17","first-page":"1","article-title":"The rlr-tree: A reinforcement learning based r-tree for spatial data","volume":"1","author":"Gu","year":"2023","journal-title":"Proc. ACM Manag. Data"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"16","DOI":"10.1145\/290593.290596","article-title":"Enhanced Nearest Neighbour Search on the R-tree","volume":"27","author":"Cheung","year":"1998","journal-title":"ACM SIGMOD Rec."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"364","DOI":"10.1145\/1071610.1071612","article-title":"iDistance: An Adaptive B+-tree Based Indexing Method for Nearest Neighbor Search","volume":"30","author":"Jagadish","year":"2005","journal-title":"ACM Trans. Database Syst."},{"key":"ref_20","first-page":"8175","article-title":"The B+-tree-based Method for Nearest Neighbor Queries in Traffic Simulation Systems","volume":"12","author":"Song","year":"2014","journal-title":"TELKOMNIKA Indones. J. Electr. Eng."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Jafari, O., Maurya, P., Islam, K.M., and Nagarkar, P. (2021). Optimizing Fair Approximate Nearest Neighbor Searches Using Threaded b+\u2212Trees. International Conference on Similarity Search and Applications, Springer International Publishing.","DOI":"10.1007\/978-3-030-89657-7_11"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Beygelzimer, A., Kakade, S., and Langford, J. (2006, January 25\u201329). Cover Trees for Nearest Neighbor. Proceedings of the 23rd International Conference on Machine Learning (ICML 2006), Pittsburgh, PA, USA.","DOI":"10.1145\/1143844.1143857"},{"key":"ref_23","unstructured":"Elkin, Y. (2022). A new compressed cover tree for k-nearest neighbour search and the stable-under-noise mergegram of a point cloud. arXiv."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Beeri, C., and Buneman, P. (1999). When Is \u201cNearest Neighbor\u201d Meaningful?. Database Theory\u2014ICDT\u201999, Proceedings of the 7th International Conference, Jerusalem, Israel, 10\u201312 January 1999, Springer. Lecture Notes in Computer Science, 1540.","DOI":"10.1007\/3-540-49257-7"},{"key":"ref_25","unstructured":"Navarro, G., and Pestov, V. (2012). Scalable Distributed Algorithm for Approximate Nearest Neighbor Search Problem in High Dimensional General Metric Spaces. Similarity Search and Applications, Proceedings of the 5th International Conference on Similarity Search and Applications (SISAP 2012), Toronto, ON, Canada, 9\u201310 August 2012, Springer. Lecture Notes in Computer Science."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1016\/j.is.2013.10.006","article-title":"Approximate nearest neighbor algorithm based on navigable small world graphs","volume":"45","author":"Malkov","year":"2014","journal-title":"Inf. Syst."},{"key":"ref_27","first-page":"2874","article-title":"Locality Preserving Hashing","volume":"28","author":"Zhao","year":"2014","journal-title":"Proc. AAAI Conf. Artif. Intell."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Song, S., Liu, L., Chen, R., Peng, W., and Wang, Y. (2023). Secure Approximate Nearest Neighbor Search with Locality-Sensitive Hashing. European Symposium on Research in Computer Security, Springer Nature.","DOI":"10.1007\/978-3-031-51479-1_21"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1109\/TIT.1967.1053964","article-title":"Nearest neighbor pattern classification","volume":"13","author":"Cover","year":"1967","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"128","DOI":"10.1109\/MSP.2007.914237","article-title":"Locality-Sensitive Hashing for Finding Nearest Neighbors [Lecture Notes]","volume":"25","author":"Slaney","year":"2008","journal-title":"IEEE Signal Process. Mag."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Wei, C., Li, C., Liu, Y., Chen, S., Zuo, Z., Wang, P., and Ye, Z. (2025). Causal Discovery and Reasoning for Continuous Variables with an Improved Bayesian Network Constructed by Locality Sensitive Hashing and Kernel Density Estimation. Entropy, 27.","DOI":"10.3390\/e27020123"},{"key":"ref_32","unstructured":"Blott, S., and Weber, R. (1998). A Simple Vector-Approximation File for Similarity Search in High-Dimensional Vector Spaces, Esprit."},{"key":"ref_33","first-page":"126","article-title":"Vector Approximation File: Cluster Bounding in High-Dimension Data Set","volume":"1","author":"Yerpude","year":"2011","journal-title":"Int. J. Eng. Adv. Technol."},{"key":"ref_34","unstructured":"Kr\u00f6ger, P., Schubert, M., and Zhu, Z. (2006, January 3\u20135). Efficient Query Processing in Arbitrary Subspaces Using Vector Approximations. Proceedings of the 18th International Conference on Scientific and Statistical Database Management (SSDBM\u203206), Vienna, Austria."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"117","DOI":"10.1109\/TPAMI.2010.57","article-title":"Product Quantization for Nearest Neighbor Search","volume":"33","author":"Douze","year":"2011","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Ge, T., He, K., Ke, Q., and Sun, J. (2013, January 23\u201328). Optimized Product Quantization for Approximate Nearest Neighbor Search. Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA.","DOI":"10.1109\/CVPR.2013.379"},{"key":"ref_37","unstructured":"Geist, M., Pietquin, O., and Fricout, G. (2009, January 22\u201324). Kernelizing Vector Quantization Algorithms. Proceedings of the European Symposium on Artificial Neural Networks (ESANN), Bruges, Belgium."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Ai, L., Cheng, H., Wang, X., Chen, C., Liu, D., Zheng, X., and Wang, Y. (2022). Approximate nearest neighbor search using enhanced accumulative quantization. Electronics, 11.","DOI":"10.3390\/electronics11142236"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Kalantidis, Y., and Avrithis, Y. (2014, January 23\u201328). Locally Optimized Product Quantization for Approximate Nearest Neighbor Search. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.298"},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"2325","DOI":"10.1007\/s11263-020-01326-x","article-title":"Product Quantization Network for Fast Visual Search","volume":"128","author":"Yu","year":"2020","journal-title":"Int. J. Comput. Vis."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"109671","DOI":"10.1016\/j.patcog.2023.109671","article-title":"Orthonormal Product Quantization Network for Scalable Face Image Retrieval","volume":"141","author":"Zhang","year":"2023","journal-title":"Pattern Recognit."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"1162","DOI":"10.1109\/TIP.2024.3359066","article-title":"Entropy-Optimized Deep Weighted Product Quantization for Image Retrieval","volume":"33","author":"Gu","year":"2024","journal-title":"IEEE Trans. Image Process."},{"key":"ref_43","unstructured":"Wang, H., Wang, W., Xu, B., Zhou, J., Du, M., Yang, K., and Xiao, Y. (2021). Tao: A Learning Framework for Adaptive Nearest Neighbor Search using Static Features Only. arXiv."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"1247","DOI":"10.1109\/TPAMI.2014.2361319","article-title":"The Inverted Multi-Index","volume":"37","author":"Babenko","year":"2014","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"824","DOI":"10.1109\/TPAMI.2018.2889473","article-title":"Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs","volume":"42","author":"Malkov","year":"2020","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Zhang, X., Tao, B., Jiang, D., Chen, B., Tang, D., and Liu, X. (2024). Novel probabilistic collision detection for manipulator motion planning using HNSW. Machines, 12.","DOI":"10.3390\/machines12050321"},{"key":"ref_47","unstructured":"pgvector (2025, September 05). Open-Source Vector Similarity Search for Postgres. Available online: https:\/\/github.com\/pgvector\/pgvector."},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Wang, J., Yi, X., Guo, R., Jin, H., Xu, P., Li, S., Wang, X., Guo, X., Li, C., and Xu, X. (2021, January 20\u201325). Milvus: A Purpose-Built Vector Data Management System. Proceedings of the 2021 International Conference on Management of Data (SIGMOD \u201921), Online.","DOI":"10.1145\/3448016.3457550"},{"key":"ref_49","unstructured":"Jin, Y., Wu, Y., Hu, W., Maggs, B.M., Zhang, X., and Zhuo, D. (2024). Curator: Efficient Indexing for Multi-Tenant Vector Databases. arXiv."},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Bruch, S., Nardini, F.M., Rulli, C., and Venturini, R. (2024, January 14\u201318). Efficient Inverted Indexes for Approximate Retrieval over Learned Sparse Representations. Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR \u201924), Washington, DC, USA.","DOI":"10.1145\/3626772.3657769"},{"key":"ref_51","unstructured":"ANN-Benchmarks (2025, September 05). Benchmarking Environment for Approximate Nearest Neighbor Algorithms. Available online: https:\/\/ann-benchmarks.com\/index.html."},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Andoni, A., and Indyk, P. (2006). Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions. 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS\u201906), Berkeley, CA, USA, 21\u201324 October 2006, IEEE.","DOI":"10.1109\/FOCS.2006.49"},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Dai, H., Zhu, M., and Gui, X. (2024). LSH Models in Federated Recommendation. Appl. Sci., 14.","DOI":"10.3390\/app14114423"},{"key":"ref_54","unstructured":"Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., and Garnett, R. (2015). Practical and Optimal LSH for Angular Distance. Advances in Neural Information Processing Systems 28 (NIPS 2015), Proceedings of the 29th Annual Conference on Neural Information Processing Systems 2015, Montreal, QC, Canada, 7\u201312 December 2015, MIT Press. Available online: https:\/\/proceedings.neurips.cc\/paper\/2015\/file\/2823f4797102ce1a1aec05359cc16dd9-Paper.pdf."},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"Romild, C.J., Schauser, T.H., and Borup, J.A. (2023). Enhancing Approximate Nearest Neighbor Search: Binary-Indexed LSH-Tries, Trie Rebuilding, and Batch Extraction. International Conference on Similarity Search and Applications, Springer Nature.","DOI":"10.1007\/978-3-031-46994-7_22"},{"key":"ref_56","doi-asserted-by":"crossref","unstructured":"Datar, M., Immorlica, N., Indyk, P., and Mirrokni, V.S. (2004, January 8\u201311). Locality-Sensitive Hashing Scheme Based on p-Stable Distributions. Proceedings of the Twentieth Annual Symposium on Computational Geometry (SCG \u201904), Brooklyn, NY, USA.","DOI":"10.1145\/997817.997857"},{"key":"ref_57","doi-asserted-by":"crossref","unstructured":"Indyk, P., and Motwani, R. (1998, January 23\u201326). Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality. Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing (STOC \u201998), Dallas, TX, USA.","DOI":"10.1145\/276698.276876"},{"key":"ref_58","doi-asserted-by":"crossref","unstructured":"Jegou, H., Douze, M., and Schmid, C. (2008, January 12\u201318). Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search. Proceedings of the European Conference on Computer Vision (ECCV 2008), Marseille, France.","DOI":"10.1007\/978-3-540-88682-2_24"},{"key":"ref_59","doi-asserted-by":"crossref","first-page":"3603","DOI":"10.14778\/3424573.3424580","article-title":"DeltaPQ: Lossless Product Quantization Code Compression for High Dimensional Similarity Search","volume":"13","author":"Wang","year":"2020","journal-title":"Proc. VLDB Endow."},{"key":"ref_60","doi-asserted-by":"crossref","first-page":"3152","DOI":"10.14778\/3415478.3415541","article-title":"AnalyticDB-V: A Hybrid Analytical Engine Towards Query Fusion for Structured and Unstructured Data","volume":"12","author":"Wei","year":"2020","journal-title":"Proc. VLDB Endow."},{"key":"ref_61","first-page":"5745","article-title":"Multiscale Quantization for Fast Similarity Search","volume":"30","author":"Wu","year":"2017","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_62","unstructured":"Johnson, J., Douze, M., and Jegou, H. (2017). Billion-scale similarity search with GPUs. arXiv."},{"key":"ref_63","doi-asserted-by":"crossref","unstructured":"Silpa-Anan, C., and Hartley, R. (2008, January 23\u201328). Optimised KD-Trees for Fast Image Descriptor Matching. Proceedings of the2008 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Anchorage, AK, USA.","DOI":"10.1109\/CVPR.2008.4587638"},{"key":"ref_64","doi-asserted-by":"crossref","unstructured":"Beckmann, N., Kriegel, H.-P., Schneider, R., and Seeger, B. (1990, January 23\u201325). The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles. Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data, Atlantic City, NJ, USA.","DOI":"10.1145\/93597.98741"},{"key":"ref_65","first-page":"331","article-title":"Fast Approximate Nearest Neighbors With Automatic Algorithm Configuration","volume":"Volume 2","author":"Muja","year":"2009","journal-title":"Proceedings of the Fourth International Conference on Computer Vision Theory and Applications (VISAPP)"},{"key":"ref_66","doi-asserted-by":"crossref","unstructured":"Aum\u00fcller, M., Bernhardsson, E., and Faithfull, A. (2017). ANN-Benchmarks: A Benchmarking Tool for Approximate Nearest Neighbor Algorithms. International Conference on Similarity Search and Applications (SISAP), Springer.","DOI":"10.1007\/978-3-319-68474-1_3"},{"key":"ref_67","doi-asserted-by":"crossref","first-page":"403","DOI":"10.14778\/3368289.3368303","article-title":"Return of the Lernaean Hydra: Experimental Evaluation of Data Series Approximate Similarity Search","volume":"13","author":"Echihabi","year":"2019","journal-title":"Proc. VLDB Endow."},{"key":"ref_68","doi-asserted-by":"crossref","first-page":"461","DOI":"10.14778\/3303753.3303754","article-title":"Fast Approximate Nearest Neighbor Search with the Navigating Spreading-out Graph","volume":"12","author":"Fu","year":"2019","journal-title":"Proc. VLDB Endow."},{"key":"ref_69","doi-asserted-by":"crossref","first-page":"1475","DOI":"10.1109\/TKDE.2019.2909204","article-title":"Approximate Nearest Neighbor Search on High Dimensional Data\u2014Experiments, Analyses, and Improvement","volume":"32","author":"Li","year":"2020","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_70","doi-asserted-by":"crossref","unstructured":"Zhang, J., Khoram, S., and Li, J. (2018, January 18\u201322). Efficient Large-Scale Approximate Nearest Neighbor Search on OpenCL FPGA. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00517"},{"key":"ref_71","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1007\/s11554-025-01670-6","article-title":"Efficient NPU\u2013GPU Scheduling for Real-Time Deep Learning Inference on Mobile Devices","volume":"22","author":"Yu","year":"2025","journal-title":"J. Real Time Image Proc."},{"key":"ref_72","doi-asserted-by":"crossref","unstructured":"Harwood, B., and Drummond, T. (2016, January 27\u201330). FANNG: Fast Approximate Nearest Neighbour Graphs. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.616"},{"key":"ref_73","doi-asserted-by":"crossref","unstructured":"Li, C., Zhang, M., Andersen, D.G., and He, Y. (2020, January 14\u201319). Improving Approximate Nearest Neighbor Search through Learned Adaptive Early Termination. Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (SIGMOD \u201820), Portland, OR, USA.","DOI":"10.1145\/3318464.3380600"},{"key":"ref_74","first-page":"475","article-title":"Learning to Route in Similarity Graphs","volume":"Volume 97","author":"Chaudhuri","year":"2019","journal-title":"Proceedings of the 36th International Conference on Machine Learning (ICML 2019)"},{"key":"ref_75","unstructured":"Bashyam, K.R., and Vadhiyar, S. (2020, January 14\u201317). Fast Scalable Approximate Nearest Neighbor Search for High-Dimensional Data. Proceedings of the 2020 IEEE International Conference on Cluster Computing (CLUSTER), Kobe, Japan."},{"key":"ref_76","doi-asserted-by":"crossref","unstructured":"Deng, S., Yan, X., Kelvin, K.W.N., Jiang, C., and Cheng, J. (2019, January 9\u201312). Pyramid: A General Framework for Distributed Similarity Search on Large-Scale Datasets. Proceedings of the 2019 IEEE International Conference on Big Data (Big Data), Los Angeles, CA, USA.","DOI":"10.1109\/BigData47090.2019.9006219"},{"key":"ref_77","unstructured":"Wallach, H., Larochelle, H., Beygelzimer, A., d\u2019Alche-Buc, F., Fox, E., and Garnett, R. (2019). DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node. Advances in Neural Information Processing Systems 32, Curran Associates, Inc.. Available online: https:\/\/proceedings.neurips.cc\/paper\/2019\/file\/09853c7fb1d3f8ee67a61b6bf4a7f8e6-Paper.pdf."},{"key":"ref_78","unstructured":"Lin, P.-C., and Zhao, W.-L. (2019). Graph based Nearest Neighbor Search: Promises and Failures. arXiv."},{"key":"ref_79","doi-asserted-by":"crossref","first-page":"106970","DOI":"10.1016\/j.patcog.2019.106970","article-title":"Hierarchical clustering-based graphs for large scale approximate nearest neighbor search","volume":"96","author":"Dias","year":"2019","journal-title":"Pattern Recognit."},{"key":"ref_80","unstructured":"PyNNDescent (2025, September 16). GitHub\u2014lmcinnes\/pynndescent: A Python Nearest Neighbor Descent for Approximate Nearest Neighbors. Available online: https:\/\/github.com\/lmcinnes\/pynndescent."},{"key":"ref_81","doi-asserted-by":"crossref","unstructured":"Nguyen, D., Lenharth, A., and Pingali, K. (2013, January 3\u20136). A Lightweight Infrastructure for Graph Analytics. Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, Farmington, PA, USA.","DOI":"10.1145\/2517349.2522739"},{"key":"ref_82","doi-asserted-by":"crossref","unstructured":"Shun, J., and Blelloch, G.E. (2013, January 23\u201327). Ligra: A Lightweight Graph Processing Framework for Shared Memory. Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP \u201813), Shenzhen, China.","DOI":"10.1145\/2442516.2442530"},{"key":"ref_83","doi-asserted-by":"crossref","unstructured":"Pandit, G., R\u00f6der, M., and Ngonga Ngomo, A.C. (2025). Evaluating Approximate Nearest Neighbour Search Systems on Knowledge Graph Embeddings. European Semantic Web Conference, Springer Nature.","DOI":"10.1007\/978-3-031-94575-5_4"},{"key":"ref_84","doi-asserted-by":"crossref","unstructured":"Zhang, K., Chen, R., and Chen, H. (2015, January 7\u201311). NUMA-Aware Graph-Structured Analytics. Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, San Francisco, CA, USA.","DOI":"10.1145\/2688500.2688507"},{"key":"ref_85","doi-asserted-by":"crossref","unstructured":"Sun, J., Vandierendonck, H., and Nikolopoulos, D.S. (2017, January 14). GraphGrind: Addressing Load Imbalance of Graph Partitioning. Proceedings of the International Conference on Supercomputing, Chicago, IL, USA.","DOI":"10.1145\/3079079.3079097"},{"key":"ref_86","doi-asserted-by":"crossref","unstructured":"Zhang, Y., Brahmakshatriya, A., Chen, X., Dhulipala, L., Kamil, S., Amarasinghe, S., and Shun, J. (2020, January 22\u201326). Optimizing Ordered Graph Algorithms with GraphIt. Proceedings of the 18th ACM\/IEEE International Symposium on Code Generation and Optimization, San Diego, CA, USA.","DOI":"10.1145\/3368826.3377909"},{"key":"ref_87","unstructured":"Vandierendonck, H. (July, January 29). Graptor: Efficient Pull and Push Style Vectorized Graph Processing. Proceedings of the 34th ACM International Conference on Supercomputing, Barcelona, Spain."},{"key":"ref_88","doi-asserted-by":"crossref","unstructured":"Malewicz, G., Austern, M.H., Bik, A.J.C., Dehnert, J.C., Horn, I., Leiser, N., and Czajkowski, G. (2010, January 6\u201310). Pregel: A System for Large-Scale Graph Processing. Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, Indianapolis, IN, USA.","DOI":"10.1145\/1807167.1807184"},{"key":"ref_89","doi-asserted-by":"crossref","first-page":"278","DOI":"10.1109\/TC.2022.3155956","article-title":"Accelerating large-scale graph-based nearest neighbor search on a computational storage platform","volume":"72","author":"Kim","year":"2022","journal-title":"IEEE Trans. Comput."},{"key":"ref_90","unstructured":"Low, Y., Gonzalez, J.E., Kyrola, A., Bickson, D., Guestrin, C., and Hellerstein, J.M. (2014). GraphLab: A New Framework for Parallel Machine Learning. arXiv."},{"key":"ref_91","unstructured":"Gonzalez, J.E., Low, Y., Gu, H., Bickson, D., and Guestrin, C. (2012, January 8\u201310). PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs. Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 12), Hollywood, CA, USA. Available online: https:\/\/www.usenix.org\/conference\/osdi12\/technical-sessions\/presentation\/gonzalez."},{"key":"ref_92","unstructured":"Thekkath, C. (2012, January 8\u201310). GraphChi: Large-Scale Graph Computation on Just a PC. Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI \u203212), Hollywood, CA, USA. Available online: https:\/\/www.usenix.org\/conference\/osdi12\/technical-sessions\/presentation\/kyrola."},{"key":"ref_93","doi-asserted-by":"crossref","unstructured":"Roy, A., Mihailovic, I., and Zwaenepoel, W. (2013, January 3\u20136). X-Stream: Edge-Centric Graph Processing Using Streaming Partitions. Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, Farmington, PA, USA.","DOI":"10.1145\/2517349.2522740"},{"key":"ref_94","doi-asserted-by":"crossref","unstructured":"Khorasani, F., Vora, K., Gupta, R., and Bhuyan, L.N. (2014, January 23\u201327). CuSha: Vertex-Centric Graph Processing on GPUs. Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing, Vancouver, BC, Canada.","DOI":"10.1145\/2600212.2600227"},{"key":"ref_95","doi-asserted-by":"crossref","unstructured":"Wang, Y., Davidson, A., Pan, Y., Wu, Y., Riffel, A., and Owens, J.D. (2016, January 12\u201316). Gunrock: A High-Performance Graph Processing Library on the GPU. Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP \u201916), Barcelona, Spain. Article 11.","DOI":"10.1145\/2851141.2851145"},{"key":"ref_96","doi-asserted-by":"crossref","unstructured":"Sengupta, D., Song, S.L., Agarwal, K., and Schwan, K. (2015, January 15\u201320). GraphReduce: Processing Large-Scale Graphs on Accelerator-Based Systems. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, Austin, TX, USA.","DOI":"10.1145\/2807591.2807655"},{"key":"ref_97","doi-asserted-by":"crossref","unstructured":"Han, W., Mawhirter, D., Wu, B., and Buland, M. (2017, January 9\u201310). Graphie: Large- Scale Asynchronous Graph Traversals on Just a GPU. Proceedings of the 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT), Portland, OR, USA.","DOI":"10.1109\/PACT.2017.41"},{"key":"ref_98","doi-asserted-by":"crossref","first-page":"103","DOI":"10.1145\/79173.79181","article-title":"A Bridging Model for Parallel Computation","volume":"33","author":"Valiant","year":"1990","journal-title":"Commun. ACM"},{"key":"ref_99","doi-asserted-by":"crossref","unstructured":"Peng, J., Zhang, X., Yuan, K., Peng, X., and Yang, G. (2025). GDVI-Fusion: Enhancing Accuracy with Optimal Geometry Matching and Deep Nearest Neighbor Optimization. Appl. Sci., 15.","DOI":"10.3390\/app15168875"},{"key":"ref_100","doi-asserted-by":"crossref","unstructured":"Naumov, M., Vrielink, A., and Garland, M. (2017, January 13). Parallel Depth-First Search for Directed Acyclic Graphs. Proceedings of the Seventh Workshop on Irregular Applications: Architectures and Algorithms, Denver, CO, USA.","DOI":"10.1145\/3149704.3149764"},{"key":"ref_101","doi-asserted-by":"crossref","first-page":"795","DOI":"10.1162\/tacl_a_00346","article-title":"Best-First Beam Search","volume":"8","author":"Meister","year":"2020","journal-title":"Trans. Assoc. Comput. Linguist."},{"key":"ref_102","unstructured":"Peng, Z., Zhang, M., Li, K., Jin, R., and Ren, B. (arXiv, 2022). Speed-ANN: Low-Latency and High-Accuracy Nearest Neighbor Search via Intra-Query Parallelism, arXiv."},{"key":"ref_103","doi-asserted-by":"crossref","first-page":"84","DOI":"10.1109\/TCOM.1980.1094577","article-title":"An Algorithm for Vector Quantizer Design","volume":"28","author":"Linde","year":"1980","journal-title":"IEEE Trans. Commun."},{"key":"ref_104","doi-asserted-by":"crossref","first-page":"20077","DOI":"10.1007\/s11042-022-11952-x","article-title":"Vector quantization using whale optimization algorithm for digital image compression","volume":"81","author":"Rahebi","year":"2022","journal-title":"Multimed. Tools Appl."},{"key":"ref_105","doi-asserted-by":"crossref","unstructured":"Zhang, X., and Wu, X. (2023, January 18\u201322). Lvqac: Lattice vector quantization coupled with spatially adaptive companding for efficient learned image compression. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada.","DOI":"10.1109\/CVPR52729.2023.00987"},{"key":"ref_106","doi-asserted-by":"crossref","first-page":"4512","DOI":"10.1109\/TSP.2022.3205474","article-title":"Graph signal compression by joint quantization and sampling","volume":"70","author":"Li","year":"2022","journal-title":"IEEE Trans. Signal Process."},{"key":"ref_107","unstructured":"Amato, G., Carrara, F., Falchi, F., Gennaro, C., Rabitti, F., and Vadicamo, L. (2020, January 21\u201324). Scalar Quantization-Based Text Encoding for Large Scale Image Retrieval. Proceedings of the 2020 Italian Symposium on Advanced Database Systems (SEBD 2020), Online. Available online: https:\/\/ceur-ws.org\/Vol-2646\/10-paper.pdf."},{"key":"ref_108","unstructured":"Veasey, T., and Trent, B. (2025, July 01). Scalar Quantization Optimized for Vector Databases. Available online: https:\/\/www.elastic.co\/search-labs\/blog\/vector-db-optimized-scalar-quantization."},{"key":"ref_109","first-page":"2","article-title":"A Survey of Product Quantization","volume":"6","author":"Matsui","year":"2018","journal-title":"ITE Trans. Media Technol. Appl."},{"key":"ref_110","doi-asserted-by":"crossref","unstructured":"Babenko, A., and Lempitsky, V. The inverted multi-index. Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, USA.","DOI":"10.1109\/CVPR.2012.6248038"},{"key":"ref_111","first-page":"2185","article-title":"Online Product Quantization","volume":"30","author":"Xu","year":"2018","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_112","unstructured":"PINECONE (2025, July 01). Faiss: The Missing Manual. Available online: https:\/\/www.pinecone.io\/learn\/series\/faiss\/vector-indexes\/."},{"key":"ref_113","doi-asserted-by":"crossref","first-page":"613","DOI":"10.1016\/j.ins.2022.06.086","article-title":"A new fast inverted file-based algorithm for approximate nearest neighbor search without accuracy reduction","volume":"608","author":"Liu","year":"2022","journal-title":"Inf. Sci."},{"key":"ref_114","doi-asserted-by":"crossref","unstructured":"Liu, J., Ouzzani, I., Li, W., Zhang, L., Ou, T., Bouamor, H., Jin, Z., and Diab, M.T. (2024). Towards Global AI Inclusivity: A Large-Scale Multilingual Terminology Dataset (GIST). arXiv.","DOI":"10.18653\/v1\/2025.findings-acl.1148"},{"key":"ref_115","doi-asserted-by":"crossref","unstructured":"Kazakovtsev, V., and Markushin, E. (2025). Data Segmentation Through Two-Level Clustering with Greedy Approach. ITM Web of Conferences, EDP Sciences.","DOI":"10.1051\/itmconf\/20257204007"},{"key":"ref_116","first-page":"178","article-title":"An opensource library for AutoML multimodal clustering on Apache Spark","volume":"540","author":"Muravyov","year":"2024","journal-title":"Zap. Nauchnykh Semin. POMI"},{"key":"ref_117","doi-asserted-by":"crossref","unstructured":"Ahmatshin, F., and Kazakovtsev, L. (2024). Mini-Batch K-Means++ Clustering Initialization. International Conference on Mathematical Optimization Theory and Operations Research, Springer Nature.","DOI":"10.1007\/978-3-031-73365-9_20"},{"key":"ref_118","doi-asserted-by":"crossref","first-page":"040003","DOI":"10.1063\/5.0124952","article-title":"A (1+ \u03bb) evolutionary algorithm with the greedy agglomerative mutation for p-median problems","volume":"Volume 2700","author":"Kazakovtsev","year":"2023","journal-title":"AIP Conference Proceedings"},{"key":"ref_119","doi-asserted-by":"crossref","unstructured":"Kazakovtsev, L., Shkaberina, G., Rozhnov, I., Li, R., and Kazakovtsev, V. (2020). Genetic Algorithms with the Crossover-Like Mutation Operator for the K-Means Problem. International Conference on Mathematical Optimization Theory and Operations Research, Springer International Publishing.","DOI":"10.1007\/978-3-030-58657-7_28"},{"key":"ref_120","unstructured":"Elkan, C. (2003, January 21\u201324). Using the triangle inequality to accelerate k-means. Proceedings of the 20th International Conference on Machine Learning (ICML-03), Washington, DC, USA."}],"container-title":["Big Data and Cognitive Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-2289\/9\/10\/254\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T18:51:04Z","timestamp":1760035864000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-2289\/9\/10\/254"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,10,9]]},"references-count":120,"journal-issue":{"issue":"10","published-online":{"date-parts":[[2025,10]]}},"alternative-id":["bdcc9100254"],"URL":"https:\/\/doi.org\/10.3390\/bdcc9100254","relation":{},"ISSN":["2504-2289"],"issn-type":[{"value":"2504-2289","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,10,9]]}}}