{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,13]],"date-time":"2026-04-13T15:25:02Z","timestamp":1776093902826,"version":"3.50.1"},"reference-count":35,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2007,3,1]],"date-time":"2007-03-01T00:00:00Z","timestamp":1172707200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Knowl. Discov. Data"],"published-print":{"date-parts":[[2007,3]]},"abstract":"<jats:p>\n            Many databases contain uncertain and imprecise references to real-world entities. The absence of identifiers for the underlying entities often results in a database which contains multiple references to the same entity. This can lead not only to data redundancy, but also inaccuracies in query processing and knowledge extraction. These problems can be alleviated through the use of\n            <jats:italic>entity resolution<\/jats:italic>\n            . Entity resolution involves discovering the underlying entities and mapping each database reference to these entities. Traditionally, entities are resolved using pairwise similarity over the attributes of references. However, there is often additional relational information in the data. Specifically, references to different entities may cooccur. In these cases, collective entity resolution, in which entities for cooccurring references are determined jointly rather than independently, can improve entity resolution accuracy. We propose a novel relational clustering algorithm that uses both attribute and relational information for determining the underlying domain entities, and we give an efficient implementation. We investigate the impact that different relational similarity measures have on entity resolution quality. We evaluate our collective entity resolution algorithm on multiple real-world databases. We show that it improves entity resolution performance over both attribute-based baselines and over algorithms that consider relational information but do not resolve entities collectively. In addition, we perform detailed experiments on synthetically generated data to identify data characteristics that favor collective relational resolution over purely attribute-based algorithms.\n          <\/jats:p>","DOI":"10.1145\/1217299.1217304","type":"journal-article","created":{"date-parts":[[2007,6,8]],"date-time":"2007-06-08T15:00:08Z","timestamp":1181314808000},"page":"5","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":389,"title":["Collective entity resolution in relational data"],"prefix":"10.1145","volume":"1","author":[{"given":"Indrajit","family":"Bhattacharya","sequence":"first","affiliation":[{"name":"University of Maryland, College Park, MD"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Lise","family":"Getoor","sequence":"additional","affiliation":[{"name":"University of Maryland, College Park, MD"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2007,3]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0378-8733(03)00009-1"},{"key":"e_1_2_1_2_1","volume-title":"The International Conference on Very Large Databases (VLDB)","author":"Ananthakrishna R.","unstructured":"Ananthakrishna , R. , Chaudhuri , S. , and Ganti , V . 2002. Eliminating fuzzy duplicates in data warehouses . In The International Conference on Very Large Databases (VLDB) . Hong Kong, China. Ananthakrishna, R., Chaudhuri, S., and Ganti, V. 2002. Eliminating fuzzy duplicates in data warehouses. In The International Conference on Very Large Databases (VLDB). Hong Kong, China."},{"key":"e_1_2_1_3_1","volume-title":"Swoosh: A generic approach to entity resolution. Tech. rep.","author":"Benjelloun O.","year":"2005","unstructured":"Benjelloun , O. , Garcia-Molina , H. , Su , Q. , and Widom , J . 2005 . Swoosh: A generic approach to entity resolution. Tech. rep. , Stanford University . (March) Benjelloun, O., Garcia-Molina, H., Su, Q., and Widom, J. 2005. Swoosh: A generic approach to entity resolution. Tech. rep., Stanford University. (March)"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/1008694.1008697"},{"key":"e_1_2_1_5_1","doi-asserted-by":"crossref","unstructured":"Bhattacharya I. and Getoor L. 2006a. Mining graph data. In Entity Resolution in Graphs. L. Holder and D. Cook Eds. John Wiley.  Bhattacharya I. and Getoor L. 2006a. Mining graph data. In Entity Resolution in Graphs. L. Holder and D. Cook Eds. John Wiley.","DOI":"10.1002\/9780470073049.ch13"},{"key":"e_1_2_1_6_1","volume-title":"The SIAM Conference on Data Mining (SIAM-SDM)","author":"Bhattacharya I.","unstructured":"Bhattacharya , I. and Getoor , L . 2006b. A latent dirichlet model for unsupervised entity resolution . In The SIAM Conference on Data Mining (SIAM-SDM) . Bethesda, MD. Bhattacharya, I. and Getoor, L. 2006b. A latent dirichlet model for unsupervised entity resolution. In The SIAM Conference on Data Mining (SIAM-SDM). Bethesda, MD."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/1150402.1150463"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/956750.956759"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/MIS.2003.1234765"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/872757.872796"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/352595.352598"},{"key":"e_1_2_1_12_1","volume-title":"The IJCAI Workshop on Information Integration on the Web (IIWeb)","author":"Cohen W.","unstructured":"Cohen , W. , Ravikumar , P. , and Fienberg , S . 2003. A comparison of string distance metrics for name-matching tasks . In The IJCAI Workshop on Information Integration on the Web (IIWeb) . Acapulco, Mexico. Cohen, W., Ravikumar, P., and Fienberg, S. 2003. A comparison of string distance metrics for name-matching tasks. In The IJCAI Workshop on Information Integration on the Web (IIWeb). Acapulco, Mexico."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/775047.775116"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/1066157.1066168"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1080\/01621459.1969.10501049"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/276675.276685"},{"key":"e_1_2_1_17_1","volume-title":"The IEEE International Conference on Data Engineering (ICDE)","author":"Gravano L.","unstructured":"Gravano , L. , Ipeirotis , P. , Koudas , N. , and Srivastava , D . 2003. Text joins for data cleansing and integration in an RDBMS . In The IEEE International Conference on Data Engineering (ICDE) . Bangalore, India. Gravano, L., Ipeirotis, P., Koudas, N., and Srivastava, D. 2003. Text joins for data cleansing and integration in an RDBMS. In The IEEE International Conference on Data Engineering (ICDE). Bangalore, India."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/223784.223807"},{"key":"e_1_2_1_19_1","volume-title":"The SIAM International Conference on Data Mining (SIAM SDM)","author":"Kalashnikov D.","unstructured":"Kalashnikov , D. , Mehrotra , S. , and Chen , Z . 2005. Exploiting relationships for domain-independent data cleaning . In The SIAM International Conference on Data Mining (SIAM SDM) . Newport Beach, CA. Kalashnikov, D., Mehrotra, S., and Chen, Z. 2005. Exploiting relationships for domain-independent data cleaning. In The SIAM International Conference on Data Mining (SIAM SDM). Newport Beach, CA."},{"key":"e_1_2_1_20_1","first-page":"45","article-title":"Semantic integration in text: From ambiguous names to identifiable entities. AI Magazine","volume":"26","author":"Li X.","year":"2005","unstructured":"Li , X. , Morie , P. , and Roth , D. 2005 . Semantic integration in text: From ambiguous names to identifiable entities. AI Magazine . Special Issue on Semantic Integration 26 , 1, 45 -- 58 . Li, X., Morie, P., and Roth, D. 2005. Semantic integration in text: From ambiguous names to identifiable entities. AI Magazine. Special Issue on Semantic Integration 26, 1, 45--58.","journal-title":"Special Issue on Semantic Integration"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/956863.956972"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/347090.347123"},{"key":"e_1_2_1_23_1","volume-title":"The Annual Conference on Neural Information Processing Systems (NIPS)","author":"McCallum A.","unstructured":"McCallum , A. and Wellner , B . 2004. Conditional models of identity uncertainty with application to noun coreference . In The Annual Conference on Neural Information Processing Systems (NIPS) . Vancouver, Canada. McCallum, A. and Wellner, B. 2004. Conditional models of identity uncertainty with application to noun coreference. In The Annual Conference on Neural Information Processing Systems (NIPS). Vancouver, Canada."},{"key":"e_1_2_1_24_1","volume-title":"The International Conference on Knowledge Discovery and Data Mining (SIGKDD)","author":"Monge A.","unstructured":"Monge , A. and Elkan , C . 1996. The field matching problem: Algorithms and applications . In The International Conference on Knowledge Discovery and Data Mining (SIGKDD) . Portland, ME. Monge, A. and Elkan, C. 1996. The field matching problem: Algorithms and applications. In The International Conference on Knowledge Discovery and Data Mining (SIGKDD). Portland, ME."},{"key":"e_1_2_1_25_1","volume-title":"The SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD)","author":"Monge A.","unstructured":"Monge , A. and Elkan , C . 1997. An efficient domain-independent algorithm for detecting approximately duplicate database records . In The SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD) . Tuscon, AZ. Monge, A. and Elkan, C. 1997. An efficient domain-independent algorithm for detecting approximately duplicate database records. In The SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD). Tuscon, AZ."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/375360.375365"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1126\/science.130.3381.954"},{"key":"e_1_2_1_28_1","volume-title":"The Annual Conference on Neural Information Processing Systems (NIPS)","author":"Pasula H.","unstructured":"Pasula , H. , Marthi , B. , Milch , B. , Russell , S. , and Shpitser , I . 2003. Identity uncertainty and citation matching . In The Annual Conference on Neural Information Processing Systems (NIPS) . Vancouver, Canada. Pasula, H., Marthi, B., Milch, B., Russell, S., and Shpitser, I. 2003. Identity uncertainty and citation matching. In The Annual Conference on Neural Information Processing Systems (NIPS). Vancouver, Canada."},{"key":"e_1_2_1_29_1","volume-title":"The Conference on Uncertainty in Artificial Intelligence (UAI)","author":"Ravikumar P.","unstructured":"Ravikumar , P. and Cohen , W . 2004. A hierarchical graphical model for record linkage . In The Conference on Uncertainty in Artificial Intelligence (UAI) . Banff, Canada. Ravikumar, P. and Cohen, W. 2004. A hierarchical graphical model for record linkage. In The Conference on Uncertainty in Artificial Intelligence (UAI). Banff, Canada."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/34.682181"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/775047.775087"},{"key":"e_1_2_1_32_1","volume-title":"The ACM SIGKDD Workshop on Multi-Relational Data Mining (MRDM)","author":"Singla P.","unstructured":"Singla , P. and Domingos , P . 2004. Multi-relational record linkage . In The ACM SIGKDD Workshop on Multi-Relational Data Mining (MRDM) . Seattle, WA. Singla, P. and Domingos, P. 2004. Multi-relational record linkage. In The ACM SIGKDD Workshop on Multi-Relational Data Mining (MRDM). Seattle, WA."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0306-4379(01)00042-4"},{"key":"e_1_2_1_34_1","volume-title":"Statistical Research Division","author":"Winkler W.","unstructured":"Winkler , W. 1999. The state of record linkage and current research problems. Tech. rep ., Statistical Research Division , U.S. Census Bureau , Washington, DC . Winkler, W. 1999. The state of record linkage and current research problems. Tech. rep., Statistical Research Division, U.S. Census Bureau, Washington, DC."},{"key":"e_1_2_1_35_1","volume-title":"Statistical Research Division","author":"Winkler W.","unstructured":"Winkler , W. 2002. Methods for record linkage and Bayesian networks. Tech. rep ., Statistical Research Division , U.S. Census Bureau , Washington, DC . Winkler, W. 2002. Methods for record linkage and Bayesian networks. Tech. rep., Statistical Research Division, U.S. Census Bureau, Washington, DC."}],"container-title":["ACM Transactions on Knowledge Discovery from Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1217299.1217304","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1217299.1217304","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T22:29:33Z","timestamp":1750285773000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1217299.1217304"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2007,3]]},"references-count":35,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2007,3]]}},"alternative-id":["10.1145\/1217299.1217304"],"URL":"https:\/\/doi.org\/10.1145\/1217299.1217304","relation":{},"ISSN":["1556-4681","1556-472X"],"issn-type":[{"value":"1556-4681","type":"print"},{"value":"1556-472X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2007,3]]},"assertion":[{"value":"2007-03-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}