{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,31]],"date-time":"2026-03-31T04:01:49Z","timestamp":1774929709048,"version":"3.50.1"},"reference-count":32,"publisher":"Association for Computing Machinery (ACM)","issue":"1-2","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2010,9]]},"abstract":"<jats:p>\n            Many data-management applications require integrating data from a variety of sources, where different sources may refer to the same real-world entity in different ways and some may even provide erroneous data. An important task in this process is to recognize and merge the various references that refer to the same entity. In practice, some attributes satisfy a\n            <jats:italic>uniqueness<\/jats:italic>\n            constraint---each real-world entity (or most entities) has a unique value for the attribute (\n            <jats:italic>e.g.<\/jats:italic>\n            , business contact phone, address, and email). Traditional techniques tackle this case by first linking records that are likely to refer to the same real-world entity, and then fusing the linked records and resolving conflicts if any. Such methods can fall short for three reasons: first, erroneous values from sources may prevent correct linking; second, the real world may contain exceptions to the uniqueness constraints and always enforcing uniqueness can miss correct values; third, locally resolving conflicts for linked records may overlook important global evidence.\n          <\/jats:p>\n          <jats:p>\n            This paper proposes a novel technique to solve this problem. The key component of our solution is to reduce the problem into a\n            <jats:italic>k<\/jats:italic>\n            -partite graph clustering problem and consider in clustering both similarity of attribute values and the sources that associate a pair of values in the same record. Thus, we perform global linkage and fusion simultaneously, and can identify incorrect values and differentiate them from alternative representations of the correct value from the beginning. In addition, we extend our algorithm to be tolerant to a few violations of the uniqueness constraints. Experimental results show accuracy and scalability of our technique.\n          <\/jats:p>","DOI":"10.14778\/1920841.1920897","type":"journal-article","created":{"date-parts":[[2014,6,24]],"date-time":"2014-06-24T12:17:57Z","timestamp":1403612277000},"page":"417-428","source":"Crossref","is-referenced-by-count":42,"title":["Record linkage with uniqueness constraints and erroneous values"],"prefix":"10.14778","volume":"3","author":[{"given":"Songtao","family":"Guo","sequence":"first","affiliation":[{"name":"AT&amp;T Interactive Research"}]},{"given":"Xin Luna","family":"Dong","sequence":"additional","affiliation":[{"name":"AT&amp;T Labs-Research"}]},{"given":"Divesh","family":"Srivastava","sequence":"additional","affiliation":[{"name":"AT&amp;T Labs-Research"}]},{"given":"Remi","family":"Zajac","sequence":"additional","affiliation":[{"name":"AT&amp;T Interactive Research"}]}],"member":"320","published-online":{"date-parts":[[2010,9]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"SODA","author":"Aslam J.","year":"1999","unstructured":"J. Aslam , K. Pelekhov , and D. Rus . A practical clustering algorithm for static and dynamic information organization . In SODA , 1999 . J. Aslam, K. Pelekhov, and D. Rus. A practical clustering algorithm for static and dynamic information organization. In SODA, 1999."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1023\/B:MACH.0000033116.57574.95"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/1217299.1217304"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/1015330.1015360"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/956750.956759"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/1066157.1066175"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/1247480.1247530"},{"key":"e_1_2_1_8_1","volume-title":"IIWEB","author":"Cohen W. W.","year":"2003","unstructured":"W. W. Cohen , P. Ravikumar , and S. E. Fienberg . A comparison of string distance metrics for name-matching tasks . In IIWEB , 2003 . W. W. Cohen, P. Ravikumar, and S. E. Fienberg. A comparison of string distance metrics for name-matching tasks. In IIWEB, 2003."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.1979.4766909"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/1066157.1066168"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687627.1687690"},{"key":"e_1_2_1_12_1","volume-title":"PVLDB","author":"Dong X. L.","year":"2009","unstructured":"X. L. Dong and F. Naumann . Data fusion-resolving data conflicts for integration . PVLDB , 2009 . X. L. Dong and F. Naumann. Data fusion-resolving data conflicts for integration. PVLDB, 2009."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1080\/01969727408546059"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2007.9"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/1376916.1376940"},{"issue":"1","key":"e_1_2_1_16_1","first-page":"757","article-title":"Reasoning about record matching rules","volume":"2","author":"Fan W.","year":"2009","unstructured":"W. Fan , X. Jia , J. Li , and S. Ma . Reasoning about record matching rules . PVLDB , 2 ( 1 ): 757 -- 768 , 2009 . W. Fan, X. Jia, J. Li, and S. Ma. Reasoning about record matching rules. PVLDB, 2(1):757--768, 2009.","journal-title":"PVLDB"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1007\/BFb0006133"},{"key":"e_1_2_1_18_1","unstructured":"S. Guo X. L. Dong D. Srivastava and R. Zajac. Record linkage with uniqueness constraints and erroneous values. http:\/\/www.research.att.com\/~lunadong\/publication\/linkage_techReport.pdf.  S. Guo X. L. Dong D. Srivastava and R. Zajac. Record linkage with uniqueness constraints and erroneous values. http:\/\/www.research.att.com\/~lunadong\/publication\/linkage_techReport.pdf."},{"key":"e_1_2_1_19_1","volume-title":"VLDB","author":"Ilyas I. F.","year":"2004","unstructured":"I. F. Ilyas , V. Markl , P. J. Haas , P. G. Brown , and A. Aboulnaga . Cords: Automatic generation of correlation statistics in db2 . In VLDB , 2004 . I. F. Ilyas, V. Markl, P. J. Haas, P. G. Brown, and A. Aboulnaga. Cords: Automatic generation of correlation statistics in db2. In VLDB, 2004."},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2009.219"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/1142473.1142599"},{"key":"e_1_2_1_22_1","series-title":"Lecture Notes in Computer Science, 223","volume-title":"The complexity of optimization problems","author":"Krentel M. W.","year":"1986","unstructured":"M. W. Krentel . The complexity of optimization problems . Lecture Notes in Computer Science, 223 , 1986 . M. W. Krentel. The complexity of optimization problems. Lecture Notes in Computer Science, 223, 1986."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1002\/nav.3800020109"},{"key":"e_1_2_1_24_1","first-page":"388","volume-title":"WSEAS","author":"Legany C.","year":"2006","unstructured":"C. Legany , S. Juhasz , and A. Babos . Cluster validity measurement techniques . In WSEAS , pages 388 -- 393 , 2006 . C. Legany, S. Juhasz, and A. Babos. Cluster validity measurement techniques. In WSEAS, pages 388--393, 2006."},{"key":"e_1_2_1_25_1","volume-title":"How much information?","author":"Lyman P.","year":"2003","unstructured":"P. Lyman , H. R. Varian , K. Swearingen , P. Charles , N. Good , L. L. Jordan , and J. Pal . How much information? 2003 . http:\/\/www2.sims.berkeley.edu\/research\/projects\/how-much-info-2003\/execsum.htm. P. Lyman, H. R. Varian, K. Swearingen, P. Charles, N. Good, L. L. Jordan, and J. Pal. How much information? 2003. http:\/\/www2.sims.berkeley.edu\/research\/projects\/how-much-info-2003\/execsum.htm."},{"key":"e_1_2_1_26_1","volume-title":"NORDSEC","author":"Petrovic S.","year":"2006","unstructured":"S. Petrovic . A comparison between the silhouette index and the davies-bouldin index in labelling ids clusters . In NORDSEC , 2006 . S. Petrovic. A comparison between the silhouette index and the davies-bouldin index in labelling ids clusters. In NORDSEC, 2006."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1007\/s007780100057"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1016\/0377-0427(87)90125-7"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1007\/11611257_51"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.5555\/645504.656273"},{"key":"e_1_2_1_31_1","volume-title":"Statistical Research Division","author":"Winkler W.","year":"2006","unstructured":"W. Winkler . Overview of record linkage and current research directions. Technical report , Statistical Research Division , U. S. Bureau of the Census, 2006 . W. Winkler. Overview of record linkage and current research directions. Technical report, Statistical Research Division, U. S. Bureau of the Census, 2006."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2010.5447904"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/1920841.1920897","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T11:36:06Z","timestamp":1672227366000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/1920841.1920897"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2010,9]]},"references-count":32,"journal-issue":{"issue":"1-2","published-print":{"date-parts":[[2010,9]]}},"alternative-id":["10.14778\/1920841.1920897"],"URL":"https:\/\/doi.org\/10.14778\/1920841.1920897","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2010,9]]}}}