{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,28]],"date-time":"2026-03-28T19:16:49Z","timestamp":1774725409530,"version":"3.50.1"},"reference-count":29,"publisher":"Association for Computing Machinery (ACM)","issue":"9","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2016,5]]},"abstract":"<jats:p>Entity Resolution is a core task for merging data collections. Due to its quadratic complexity, it typically scales to large volumes of data through blocking: similar entities are clustered into blocks and pair-wise comparisons are executed only between co-occurring entities, at the cost of some missed matches. There are numerous blocking methods, and the aim of this work is to offer a comprehensive empirical survey, extending the dimensions of comparison beyond what is commonly available in the literature. We consider 17 state-of-the-art blocking methods and use 6 popular real datasets to examine the robustness of their internal configurations and their relative balance between effectiveness and time efficiency. We also investigate their scalability over a corpus of 7 established synthetic datasets that range from 10,000 to 2 million entities.<\/jats:p>","DOI":"10.14778\/2947618.2947624","type":"journal-article","created":{"date-parts":[[2016,7,26]],"date-time":"2016-07-26T13:28:39Z","timestamp":1469539719000},"page":"684-695","source":"Crossref","is-referenced-by-count":103,"title":["Comparative analysis of approximate blocking techniques for entity resolution"],"prefix":"10.14778","volume":"9","author":[{"given":"George","family":"Papadakis","sequence":"first","affiliation":[{"name":"University of Athens, Greece"}]},{"given":"Jonathan","family":"Svirsky","sequence":"additional","affiliation":[{"name":"Israel Institute of Technology"}]},{"given":"Avigdor","family":"Gal","sequence":"additional","affiliation":[{"name":"Israel Institute of Technology"}]},{"given":"Themis","family":"Palpanas","sequence":"additional","affiliation":[{"name":"Paris Descartes University, France"}]}],"member":"320","published-online":{"date-parts":[[2016,5]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.5555\/1105926.1106227"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2006.13"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2005.125"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/1401890.1402020"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2011.127"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2007.9"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1080\/01621459.1969.10501049"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/2783258.2783396"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.14778\/2733004.2733068"},{"key":"e_1_2_1_10_1","volume-title":"EDBT (tutorial)","author":"Gal A.","year":"2015","unstructured":"A. Gal and B. Kimelfeld . Entity resolution in the big data era: Probabilistic db support to entity resolution . In EDBT (tutorial) , 2015 . A. Gal and B. Kimelfeld. Entity resolution in the big data era: Probabilistic db support to entity resolution. In EDBT (tutorial), 2015."},{"key":"e_1_2_1_11_1","first-page":"371","volume-title":"Model and Algorithms. In VLDB","author":"Galhardas H.","year":"2001","unstructured":"H. Galhardas , D. Florescu , D. Shasha , E. Simon , and C. Saita . Declarative Data Cleaning: Language , Model and Algorithms. In VLDB , pages 371 -- 380 , 2001 . H. Galhardas, D. Florescu, D. Shasha, E. Simon, and C. Saita. Declarative Data Cleaning: Language, Model and Algorithms. In VLDB, pages 371--380, 2001."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.14778\/2367502.2367564"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/1410358.1410359"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2005.166"},{"key":"e_1_2_1_15_1","first-page":"491","volume-title":"VLDB","author":"Gravano L.","year":"2001","unstructured":"L. Gravano , P. Ipeirotis , H. Jagadish , N. Koudas , S. Muthukrishnan , and D. Srivastava . Approximate string joins in a database (almost) for free . In VLDB , pages 491 -- 500 , 2001 . L. Gravano, P. Ipeirotis, H. Jagadish, N. Koudas, S. Muthukrishnan, and D. Srivastava. Approximate string joins in a database (almost) for free. In VLDB, pages 491--500, 2001."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/568271.223807"},{"key":"e_1_2_1_17_1","volume-title":"WebDB","author":"Isele R.","year":"2011","unstructured":"R. Isele , A. Jentzsch , and C. Bizer . Efficient multidimensional blocking for link discovery without losing recall . In WebDB , 2011 . R. Isele, A. Jentzsch, and C. Bizer. Efficient multidimensional blocking for link discovery without losing recall. In WebDB, 2011."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.is.2012.11.008"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.datak.2009.10.003"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/2433396.2433439"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/347090.347123"},{"key":"e_1_2_1_22_1","first-page":"440","volume-title":"AAAI","author":"Michelson M.","year":"2006","unstructured":"M. Michelson and C. A. Knoblock . Learning blocking schemes for record linkage . In AAAI , pages 440 -- 445 , 2006 . M. Michelson and C. A. Knoblock. Learning blocking schemes for record linkage. In AAAI, pages 440--445, 2006."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.5555\/2283696.2283783"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.14778\/2856318.2856326"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/1998076.1998093"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2012.150"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2013.54"},{"key":"e_1_2_1_28_1","first-page":"221","volume-title":"EDBT","author":"Papadakis G.","year":"2016","unstructured":"G. Papadakis , G. Papastefanatos , T. Palpanas , and M. Koubarakis . Scaling entity resolution to large, heterogeneous data with enhanced meta-blocking . In EDBT , pages 221 -- 232 , 2016 . G. Papadakis, G. Papastefanatos, T. Palpanas, and M. Koubarakis. Scaling entity resolution to large, heterogeneous data with enhanced meta-blocking. In EDBT, pages 221--232, 2016."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/1559845.1559870"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/2947618.2947624","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T09:33:33Z","timestamp":1672220013000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/2947618.2947624"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,5]]},"references-count":29,"journal-issue":{"issue":"9","published-print":{"date-parts":[[2016,5]]}},"alternative-id":["10.14778\/2947618.2947624"],"URL":"https:\/\/doi.org\/10.14778\/2947618.2947624","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2016,5]]}}}