{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,2]],"date-time":"2026-05-02T04:44:30Z","timestamp":1777697070511,"version":"3.51.4"},"reference-count":11,"publisher":"SAGE Publications","issue":"6","license":[{"start":{"date-parts":[[2025,10,28]],"date-time":"2025-10-28T00:00:00Z","timestamp":1761609600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["Intelligent Decision Technologies"],"published-print":{"date-parts":[[2025,11]]},"abstract":"<jats:p>The discipline of Entity Resolution (ER), the process of identifying and linking records that refer to the same real-world entity, has been fundamentally reshaped by the adoption of high-dimensional vector embeddings. This transformation reframes ER as a large-scale Approximate Nearest Neighbor Search (ANNS) problem, making the choice of ANNS architecture a critical determinant of system performance. This paper provides a deep architectural comparison and a novel, large-scale empirical evaluation of the two dominant ANNS paradigms: graph-based methods (HNSW, DiskANN) and partition-based methods (Faiss-IVF+PQ, Scann). We introduce a new semi-synthetic benchmark tailored to the ER task, consisting of two one-million-vector datasets with a known ground truth. On this benchmark, we conduct a comprehensive evaluation, measuring not only total query time but also disaggregated blocking and matching times, alongside canonical ER quality metrics: precision, recall, and F1-score. Our findings reveal that partition-based methods, particularly Scann, offer superior performance in high-throughput, moderate-recall scenarios, while graph-based methods like HNSW and DiskANN are unequivocally superior for applications demanding the highest levels of matching quality. This work provides a nuanced, application-centric analysis that culminates in a set of actionable recommendations for practitioners designing modern data integration and retrieval systems.<\/jats:p>","DOI":"10.1177\/18724981251388888","type":"journal-article","created":{"date-parts":[[2025,10,28]],"date-time":"2025-10-28T15:39:10Z","timestamp":1761665950000},"page":"3826-3840","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":0,"title":["A comparative analysis of graph-based and partition-based approximate nearest neighbor search for large-scale entity resolution"],"prefix":"10.1177","volume":"19","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3878-5988","authenticated-orcid":false,"given":"Dimitrios","family":"Karapiperis","sequence":"first","affiliation":[{"name":"International Hellenic University, Thermi, Greece"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6602-0723","authenticated-orcid":false,"given":"Leonidas","family":"Akritidis","sequence":"additional","affiliation":[{"name":"International Hellenic University, Thermi, Greece"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9435-1829","authenticated-orcid":false,"given":"Panayiotis","family":"Bozanis","sequence":"additional","affiliation":[{"name":"International Hellenic University, Thermi, Greece"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9758-0819","authenticated-orcid":false,"given":"Vassilios S","family":"Verykios","sequence":"additional","affiliation":[{"name":"Hellenic Open University"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"179","published-online":{"date-parts":[[2025,10,28]]},"reference":[{"key":"e_1_3_3_2_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-31164-2"},{"key":"e_1_3_3_3_2","unstructured":"Devlin J Chang MW Lee K et al. BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies Volume 1 (Long and Short Papers) pp.4171\u20134186."},{"key":"e_1_3_3_4_2","doi-asserted-by":"crossref","unstructured":"Reimers N Gurevych I. Sentence-BERT: sentence embeddings using siamese BERT-Networks. In: Proceedings of the 2019 conference on empirical methods in natural language processing (EMNLP-IJCNLP) pp.3980\u20133990.","DOI":"10.18653\/v1\/D19-1410"},{"key":"e_1_3_3_5_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2018.2889473"},{"key":"e_1_3_3_6_2","unstructured":"Subramanya JS Devvrit F Simhadri HV et al. Diskann: fast accurate billion-point nearest neighbor search on a single node. In: Advances in neural information processing systems volume 32. Curran Associates Inc."},{"key":"e_1_3_3_7_2","doi-asserted-by":"crossref","unstructured":"Douze M Guzhva A Deng CH et al. The faiss library 2024.","DOI":"10.1109\/TBDATA.2025.3618474"},{"key":"e_1_3_3_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2010.57"},{"key":"e_1_3_3_9_2","unstructured":"Sun RGP Lindgren E Geng Q et al. Accelerating large-scale inference with anisotropic vector quantization. In: International conference on machine learning."},{"key":"e_1_3_3_10_2","doi-asserted-by":"publisher","DOI":"10.3390\/electronics14183605"},{"key":"e_1_3_3_11_2","unstructured":"Wang W Wei F Dong L et al. Deep self-attention distillation for task-agnostic compression of pre-trained transformers. In: Advances in neural information processing systems 33."},{"key":"e_1_3_3_12_2","unstructured":"Singh A Subramanya S Krishnaswamy R et al. Freshdiskann: a fast and accurate graph-based ann index for streaming similarity search 2021. 2105.09613."}],"container-title":["Intelligent Decision Technologies"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/18724981251388888","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/18724981251388888","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/18724981251388888","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T09:21:47Z","timestamp":1777454507000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/18724981251388888"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,10,28]]},"references-count":11,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2025,11]]}},"alternative-id":["10.1177\/18724981251388888"],"URL":"https:\/\/doi.org\/10.1177\/18724981251388888","relation":{},"ISSN":["1872-4981","1875-8843"],"issn-type":[{"value":"1872-4981","type":"print"},{"value":"1875-8843","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,10,28]]}}}