{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,5]],"date-time":"2026-02-05T04:36:21Z","timestamp":1770266181022,"version":"3.49.0"},"reference-count":15,"publisher":"Association for Computing Machinery (ACM)","issue":"1","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2009,8]]},"abstract":"<jats:p>Many data management applications, such as setting up Web portals, managing enterprise data, managing community data, and sharing scientific data, require integrating data from multiple sources. Each of these sources provides a set of values and different sources can often provide conflicting values. To present quality data to users, it is critical that data integration systems can resolve conflicts and discover true values. Typically, we expect a true value to be provided by more sources than any particular false one, so we can take the value provided by the majority of the sources as the truth. Unfortunately, a false value can be spread through copying and that makes truth discovery extremely tricky. In this paper, we consider how to find true values from conflicting information when there are a large number of sources, among which some may copy from others.<\/jats:p>\n          <jats:p>\n            We present a novel approach that considers\n            <jats:italic>dependence<\/jats:italic>\n            between data sources in truth discovery. Intuitively, if two data sources provide a large number of common values and many of these values are rarely provided by other sources (\n            <jats:italic>e.g.<\/jats:italic>\n            , particular false values), it is very likely that one copies from the other. We apply Bayesian analysis to decide dependence between sources and design an algorithm that iteratively detects dependence and discovers truth from conflicting information. We also extend our model by considering\n            <jats:italic>accuracy<\/jats:italic>\n            of data sources and\n            <jats:italic>similarity<\/jats:italic>\n            between values. Our experiments on synthetic data as well as real-world data show that our algorithm can significantly improve accuracy of truth discovery and is scalable when there are a large number of data sources.\n          <\/jats:p>","DOI":"10.14778\/1687627.1687690","type":"journal-article","created":{"date-parts":[[2014,6,24]],"date-time":"2014-06-24T12:17:57Z","timestamp":1403612277000},"page":"550-561","source":"Crossref","is-referenced-by-count":294,"title":["Integrating conflicting data"],"prefix":"10.14778","volume":"2","author":[{"given":"Xin Luna","family":"Dong","sequence":"first","affiliation":[{"name":"AT&amp;T Labs--Research, Florham Park, NJ"}]},{"given":"Laure","family":"Berti-Equille","sequence":"additional","affiliation":[{"name":"Universit\u00e9 de Rennes, Rennes cedex, France"}]},{"given":"Divesh","family":"Srivastava","sequence":"additional","affiliation":[{"name":"AT&amp;T Labs--Research, Florham Park, NJ"}]}],"member":"320","published-online":{"date-parts":[[2009,8]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"CIDR","author":"Berti-Equille L.","year":"2009","unstructured":"L. Berti-Equille , A. D. Sarma , X. L. Dong , A. Marian , and D. Srivastava . Sailing the information ocean with awareness of currents: Discovery and application of source dependence . In CIDR , 2009 . L. Berti-Equille, A. D. Sarma, X. L. Dong, A. Marian, and D. Srivastava. Sailing the information ocean with awareness of currents: Discovery and application of source dependence. In CIDR, 2009."},{"key":"e_1_2_1_2_1","volume-title":"WWW","author":"Bleiholder J.","year":"2006","unstructured":"J. Bleiholder and F. Naumann . Conflict handling strategies in an integrated information system . In WWW , 2006 . J. Bleiholder and F. Naumann. Conflict handling strategies in an integrated information system. In WWW, 2006."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/1052934.1052942"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0169-7552(98)00110-X"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/1376916.1376918"},{"key":"e_1_2_1_7_1","unstructured":"X. L. Dong L. Berti-Equille and D. Srivastava. Integrating conflicting data: the role of source dependence. http:\/\/www.research.att.com\/~lunadong\/publication\/indep_techReport.pdf.  X. L. Dong L. Berti-Equille and D. Srivastava. Integrating conflicting data: the role of source dependence. http:\/\/www.research.att.com\/~lunadong\/publication\/indep_techReport.pdf."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.2307\/2981768"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/775152.775242"},{"key":"e_1_2_1_10_1","volume-title":"SODA","author":"Kleinberg J. M.","year":"1998","unstructured":"J. M. Kleinberg . Authoritative sources in a hyperlinked environment . In SODA , 1998 . J. M. Kleinberg. Authoritative sources in a hyperlinked environment. In SODA, 1998."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1287\/opre.31.5.866"},{"key":"e_1_2_1_12_1","unstructured":"Master data management. http:\/\/en.wikipedia.org\/wiki\/Master_Data_Management.  Master data management. http:\/\/en.wikipedia.org\/wiki\/Master_Data_Management."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/872757.872770"},{"key":"e_1_2_1_14_1","volume-title":"IEEE Intl. Conf. on Peer-to-Peer Computing","author":"Singh A.","year":"2003","unstructured":"A. Singh and L. Liu . TrustMe: anonymous management of trust relationships in decentralized P2P systems . In IEEE Intl. Conf. on Peer-to-Peer Computing , 2003 . A. Singh and L. Liu. TrustMe: anonymous management of trust relationships in decentralized P2P systems. In IEEE Intl. Conf. on Peer-to-Peer Computing, 2003."},{"key":"e_1_2_1_15_1","volume-title":"Proc. of WebDB","author":"Wu M.","year":"2007","unstructured":"M. Wu and A. Marian . Corroborating answers from multiple web sources . In Proc. of WebDB , 2007 . M. Wu and A. Marian. Corroborating answers from multiple web sources. In Proc. of WebDB, 2007."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/1281192.1281309"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/1687627.1687690","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T11:29:38Z","timestamp":1672226978000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/1687627.1687690"}},"subtitle":["the role of source dependence"],"short-title":[],"issued":{"date-parts":[[2009,8]]},"references-count":15,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2009,8]]}},"alternative-id":["10.14778\/1687627.1687690"],"URL":"https:\/\/doi.org\/10.14778\/1687627.1687690","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2009,8]]}}}