{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,20]],"date-time":"2026-02-20T19:03:16Z","timestamp":1771614196166,"version":"3.50.1"},"reference-count":67,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2024,3,19]],"date-time":"2024-03-19T00:00:00Z","timestamp":1710806400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001659","name":"Deutsche Forschungsgemeinschaft","doi-asserted-by":"crossref","award":["MA 3964\/8-2"],"award-info":[{"award-number":["MA 3964\/8-2"]}],"id":[{"id":"10.13039\/501100001659","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Ministry of Education & Research, Germany","award":["01PQ17001"],"award-info":[{"award-number":["01PQ17001"]}]},{"name":"DFG","award":["460234259, and 460676019"],"award-info":[{"award-number":["460234259, and 460676019"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["J. Data and Information Quality"],"published-print":{"date-parts":[[2024,3,31]]},"abstract":"<jats:p>\n            In entity resolution,\n            <jats:italic>blocking<\/jats:italic>\n            pre-partitions data for further processing by more expensive methods. Two entity mentions are in the same block if they share identical or related\n            <jats:italic>blocking-keys<\/jats:italic>\n            . Previous work has sometimes related blocking keys by grouping or alphabetically sorting them, but\u2014as was shown for author disambiguation\u2014the respective equivalences or total orders are not necessarily well-suited to model the logical matching-relation between blocking keys. To address this, we present a novel blocking approach that exploits the subset\n            <jats:italic>partial<\/jats:italic>\n            order over entity representations to build a matching-based bipartite graph, using connected components as blocks. To prevent over- and underconnectedness, we allow specification of overly general and generalization of overly specific representations. To build the bipartite graph, we contribute a new parallellized algorithm with configurable time\/space tradeoff for minimal element search in the subset partial order. As a job-based approach, it combines dynamic scalability and easier integration to make it more convenient than the previously described approaches. Experiments on large gold standards for publication records, author mentions, and affiliation strings suggest that our approach is competitive in performance and allows better addressing of domain-specific problems. For duplicate detection and author disambiguation, our method offers the expected performance as defined by the vector-similarity baseline used in another work on the same dataset and the common surname, first-initial baseline. For top-level institution resolution, we have reproduced the challenges described in prior work, strengthening the conclusion that for affiliation data, overlapping blocks under minimal elements are more suitable than connected components.\n          <\/jats:p>","DOI":"10.1145\/3646553","type":"journal-article","created":{"date-parts":[[2024,2,20]],"date-time":"2024-02-20T12:26:58Z","timestamp":1708432018000},"page":"1-29","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["Connected Components for Scaling Partial-order Blocking to Billion Entities"],"prefix":"10.1145","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2492-5297","authenticated-orcid":false,"given":"Tobias","family":"Backes","sequence":"first","affiliation":[{"name":"GESIS - Leibniz Institute for the Social Sciences, Cologne, Germany"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-4364-9243","authenticated-orcid":false,"given":"Stefan","family":"Dietze","sequence":"additional","affiliation":[{"name":"GESIS - Leibniz Institute for the Social Sciences, Cologne, Germany"}]}],"member":"320","published-online":{"date-parts":[[2024,3,19]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.1137\/0201008"},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-47451-4_8"},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1145\/3197026.3197036"},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1145\/3269206.3271699"},{"key":"e_1_3_2_6_2","volume-title":"Partial Orders and Progressive Blocking: A Matching-based Framework for Large-scale Entity Resolution in Bibliographic Data","author":"Backes Tobias","year":"2023","unstructured":"Tobias Backes. 2023. Partial Orders and Progressive Blocking: A Matching-based Framework for Large-scale Entity Resolution in Bibliographic Data. Ph. D. Dissertation. Universit\u00e4ts-und Landesbibliothek der Heinrich-Heine-Universit\u00e4t D\u00fcsseldorf."},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.is.2022.102056"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1007\/s00799-022-00326-1"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.1137\/1.9781611972818.3"},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1145\/3394957"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10115-015-0895-7"},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.1145\/509907.509965"},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.5555\/2449288.2449299"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9780511809071"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1145\/3418896"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.5555\/2876221"},{"key":"e_1_3_2_17_2","unstructured":"Nicola De Cao Gautier Izacard Sebastian Riedel and Fabio Petroni. 2021. Autoregressive Entity Retrieval. Retrieved from http:\/\/arxiv.org\/abs\/2010.00904"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1162\/qss_a_00013"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.14778\/3236187.3236198"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.14778\/2876473.2876474"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jpdc.2013.07.004"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.14778\/3538598.3538611"},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE51399.2021.00112"},{"key":"e_1_3_2_24_2","first-page":"901","volume-title":"Proceedings of the 12th Language Resources and Evaluation Conference","author":"Gyawali Bikash","year":"2020","unstructured":"Bikash Gyawali, Lucas Anastasiou, and Petr Knoth. 2020. Deduplication of scholarly documents using locality sensitive hashing and word embeddings. In Proceedings of the 12th Language Resources and Evaluation Conference. European Language Resources Association, 901\u2013910."},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1145\/1148170.1148222"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1145\/362248.362272"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1007\/11408079_69"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.5555\/3618408.3619049"},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.14778\/1920841.1920904"},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1145\/1837210.1837221"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i15.17562"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11390-020-0350-4"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00141"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1007\/s41019-019-00107-y"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2015.7113293"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1145\/1242572.1242592"},{"issue":"4","key":"e_1_3_2_37_2","article-title":"A practical algorithm for finding extremal sets up to permutation","volume":"9","author":"Marinov Martin","year":"2014","unstructured":"Martin Marinov and D. Gregg. 2014. A practical algorithm for finding extremal sets up to permutation. J. Experim. Algor. 9, 4 (2014).","journal-title":"J. Experim. Algor."},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.1145\/2893184"},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1145\/3183713.3196926"},{"key":"e_1_3_2_40_2","volume-title":"An Introduction to Duplicate Detection","author":"Nauman Felix","year":"2022","unstructured":"Felix Nauman and Melanie Herschel. 2022. An Introduction to Duplicate Detection. Springer Nature."},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.1145\/3377455"},{"key":"e_1_3_2_42_2","unstructured":"Ralph Peeters and Christian Bizer. 2023. Entity Matching using Large Language Models. Retrieved from http:\/\/arxiv.org\/abs\/2310.11244"},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.1007\/BF01261654"},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.1016\/0020-0190(95)00165-4"},{"key":"e_1_3_2_45_2","doi-asserted-by":"publisher","DOI":"10.1016\/S0020-0190(97)00084-7"},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.1006\/jagm.1999.1032"},{"key":"e_1_3_2_47_2","volume-title":"Evaluation of Cohort Algorithms for the FloC API","author":"Ravichandran Deepak","year":"2021","unstructured":"Deepak Ravichandran and Sergei Vassilvitski. 2021. Evaluation of Cohort Algorithms for the FloC API. Technical Report. Google Research & Ads."},{"key":"e_1_3_2_48_2","volume-title":"Disambiguation of Author Addresses in Bibliometric Databases","author":"Rimmert C.","year":"2017","unstructured":"C. Rimmert, H. Schwechheimer, and M. Winterhager. 2017. Disambiguation of Author Addresses in Bibliometric Databases. Technical Report. Bielefeld University."},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0245122"},{"key":"e_1_3_2_50_2","doi-asserted-by":"publisher","DOI":"10.3233\/SW-222986"},{"key":"e_1_3_2_51_2","doi-asserted-by":"publisher","DOI":"10.1080\/00207169808804719"},{"key":"e_1_3_2_52_2","doi-asserted-by":"publisher","DOI":"10.1080\/00207169608804512"},{"key":"e_1_3_2_53_2","doi-asserted-by":"crossref","unstructured":"Wei Shen Yuhan Li Yinan Liu Jiawei Han Jianyong Wang and Xiaojie Yuan. 2021. Entity Linking Meets Deep Learning: Techniques and Solutions. Retrieved from http:\/\/arxiv.org\/abs\/2109.12520","DOI":"10.1109\/TKDE.2021.3117715"},{"key":"e_1_3_2_54_2","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2014.2327028"},{"key":"e_1_3_2_55_2","volume-title":"Entity Resolution and Information Quality","author":"Talburt John R.","year":"2011","unstructured":"John R. Talburt. 2011. Entity Resolution and Information Quality. Elsevier."},{"issue":"12","key":"e_1_3_2_56_2","article-title":"An iterative, self-assessing entity resolution system: First steps toward a data washing machine","volume":"11","author":"Talburt John R.","year":"2020","unstructured":"John R. Talburt, Daniel Pullen, Leon Claassens, Richard Wang, et\u00a0al. 2020. An iterative, self-assessing entity resolution system: First steps toward a data washing machine. Int. J. Advanc. Comput. Sci. Applic. 11, 12 (2020).","journal-title":"Int. J. Advanc. Comput. Sci. Applic."},{"key":"e_1_3_2_57_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-36257-6_11"},{"key":"e_1_3_2_58_2","volume-title":"Proceedings of the NeurIPS 2022 First Table Representation Workshop","author":"Tang Jiawei","year":"2022","unstructured":"Jiawei Tang, Yifei Zuo, Lei Cao, and Samuel Madden. 2022. Generic entity resolution models. In Proceedings of the NeurIPS 2022 First Table Representation Workshop."},{"key":"e_1_3_2_59_2","doi-asserted-by":"publisher","DOI":"10.14778\/3476249.3476294"},{"key":"e_1_3_2_60_2","unstructured":"Milena Trajanoska Riste Stojanov and Dimitar Trajanov. 2023. Enhancing Knowledge Graph Construction Using Large Language Models. Retrieved from http:\/\/arxiv.org\/abs\/2305.04676"},{"key":"e_1_3_2_61_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2019.00181"},{"key":"e_1_3_2_62_2","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2015.2468711"},{"key":"e_1_3_2_63_2","unstructured":"Yifan Wang. 2022. A Survey on Efficient Processing of Similarity Queries over Neural Embeddings. Retrieved from http:\/\/arxiv.org\/abs\/2204.07922"},{"key":"e_1_3_2_64_2","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-020-00644-3"},{"key":"e_1_3_2_65_2","doi-asserted-by":"publisher","DOI":"10.5555\/139404.139481"},{"key":"e_1_3_2_66_2","doi-asserted-by":"publisher","DOI":"10.1016\/0020-0190(93)90264-A"},{"key":"e_1_3_2_67_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11704-015-5900-5"},{"key":"e_1_3_2_68_2","doi-asserted-by":"crossref","unstructured":"Alexandros Zeakis George Papadakis Dimitrios Skoutas and Manolis Koubarakis. 2023. Pre-trained Embeddings for Entity Resolution: An Experimental Analysis [Experiment Analysis & Benchmark]. Retrieved from http:\/\/arxiv.org\/abs\/2304.12329","DOI":"10.14778\/3598581.3598594"}],"container-title":["Journal of Data and Information Quality"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3646553","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3646553","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:17:41Z","timestamp":1750295861000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3646553"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,3,19]]},"references-count":67,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2024,3,31]]}},"alternative-id":["10.1145\/3646553"],"URL":"https:\/\/doi.org\/10.1145\/3646553","relation":{},"ISSN":["1936-1955","1936-1963"],"issn-type":[{"value":"1936-1955","type":"print"},{"value":"1936-1963","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,3,19]]},"assertion":[{"value":"2023-08-09","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-12-27","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-03-19","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}