{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,2]],"date-time":"2026-07-02T13:17:17Z","timestamp":1782998237159,"version":"3.54.5"},"reference-count":20,"publisher":"Association for Computing Machinery (ACM)","issue":"2","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2012,12]]},"abstract":"<jats:p>The amount of useful information available on the Web has been growing at a dramatic pace in recent years and people rely more and more on the Web to fulfill their information needs. In this paper, we study truthfulness of Deep Web data in two domains where we believed data are fairly clean and data quality is important to people's lives: Stock and Flight. To our surprise, we observed a large amount of inconsistency on data from different sources and also some sources with quite low accuracy. We further applied on these two data sets state-of-the-art data fusion methods that aim at resolving conflicts and finding the truth, analyzed their strengths and limitations, and suggested promising research directions. We wish our study can increase awareness of the seriousness of conflicting data on the Web and in turn inspire more research in our community to tackle this problem.<\/jats:p>","DOI":"10.14778\/2535568.2448943","type":"journal-article","created":{"date-parts":[[2014,6,24]],"date-time":"2014-06-24T12:17:57Z","timestamp":1403612277000},"page":"97-108","source":"Crossref","is-referenced-by-count":178,"title":["Truth finding on the deep web"],"prefix":"10.14778","volume":"6","author":[{"given":"Xian","family":"Li","sequence":"first","affiliation":[{"name":"SUNY at Binghamton"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Xin Luna","family":"Dong","sequence":"additional","affiliation":[{"name":"AT&amp;T Labs-Research"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Kenneth","family":"Lyons","sequence":"additional","affiliation":[{"name":"AT&amp;T Labs-Research"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Weiyi","family":"Meng","sequence":"additional","affiliation":[{"name":"SUNY at Binghamton"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Divesh","family":"Srivastava","sequence":"additional","affiliation":[{"name":"AT&amp;T Labs-Research"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2012,12]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"CIDR","author":"Berti-Equille L.","year":"2009","unstructured":"L. Berti-Equille , A. D. Sarma , X. L. Dong , A. Marian , and D. Srivastava . Sailing the information ocean with awareness of currents: Discovery and application of source dependence . In CIDR , 2009 . L. Berti-Equille, A. D. Sarma, X. L. Dong, A. Marian, and D. Srivastava. Sailing the information ocean with awareness of currents: Discovery and application of source dependence. In CIDR, 2009."},{"key":"e_1_2_1_2_1","first-page":"83","article-title":"Probabilistic models to reconcile complex data from inaccurate data sources","author":"Blanco L.","year":"2010","unstructured":"L. Blanco , V. Crescenzi , P. Merialdo , and P. Papotti . Probabilistic models to reconcile complex data from inaccurate data sources . In CAiSE , 83 - 97 , 2010 . L. Blanco, V. Crescenzi, P. Merialdo, and P. Papotti. Probabilistic models to reconcile complex data from inaccurate data sources. In CAiSE, 83-97, 2010.","journal-title":"CAiSE"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/1456650.1456651"},{"issue":"7","key":"e_1_2_1_4_1","first-page":"680","article-title":"An analysis of structured data on the web","volume":"5","author":"Dalvi N.","year":"2012","unstructured":"N. Dalvi , A. Machanavajjhala , and B. Pang . An analysis of structured data on the web . PVLDB , 5 ( 7 ): 680 - 691 , 2012 . N. Dalvi, A. Machanavajjhala, and B. Pang. An analysis of structured data on the web. PVLDB, 5(7):680-691, 2012.","journal-title":"PVLDB"},{"issue":"1","key":"e_1_2_1_5_1","first-page":"1358","article-title":"Global detection of complex copying relationships between sources","volume":"3","author":"Dong X. L.","year":"2010","unstructured":"X. L. Dong , L. Berti-Equille , Y. Hu , and D. Srivastava . Global detection of complex copying relationships between sources . PVLDB , 3 ( 1 ): 1358 - 1369 , 2010 . X. L. Dong, L. Berti-Equille, Y. Hu, and D. Srivastava. Global detection of complex copying relationships between sources. PVLDB, 3(1):1358-1369, 2010.","journal-title":"PVLDB"},{"issue":"1","key":"e_1_2_1_6_1","first-page":"550","article-title":"Integrating conflicting data: the role of source dependence","volume":"2","author":"Dong X. L.","year":"2009","unstructured":"X. L. Dong , L. Berti-Equille , and D. Srivastava . Integrating conflicting data: the role of source dependence . PVLDB , 2 ( 1 ): 550 - 561 , 2009 . X. L. Dong, L. Berti-Equille, and D. Srivastava. Integrating conflicting data: the role of source dependence. PVLDB, 2(1):550-561, 2009.","journal-title":"PVLDB"},{"issue":"1","key":"e_1_2_1_7_1","first-page":"562","article-title":"Truth discovery and copying detection in a dynamic world","volume":"2","author":"Dong X. L.","year":"2009","unstructured":"X. L. Dong , L. Berti-Equille , and D. Srivastava . Truth discovery and copying detection in a dynamic world . PVLDB , 2 ( 1 ): 562 - 573 , 2009 . X. L. Dong, L. Berti-Equille, and D. Srivastava. Truth discovery and copying detection in a dynamic world. PVLDB, 2(1):562-573, 2009.","journal-title":"PVLDB"},{"issue":"2","key":"e_1_2_1_8_1","first-page":"1654","article-title":"Data fusion-resolving data conflicts for integration","volume":"2","author":"Dong X. L.","year":"2009","unstructured":"X. L. Dong and F. Naumann . Data fusion-resolving data conflicts for integration . PVLDB , 2 ( 2 ): 1654 - 1655 , 2009 . X. L. Dong and F. Naumann. Data fusion-resolving data conflicts for integration. PVLDB, 2(2):1654-1655, 2009.","journal-title":"PVLDB"},{"key":"e_1_2_1_9_1","volume-title":"Less is more: Selecting sources wisely for integration. PVLDB, 6(2)","author":"Dong X. L.","year":"2013","unstructured":"X. L. Dong , B. Saha , and D. Srivastava . Less is more: Selecting sources wisely for integration. PVLDB, 6(2) , 2013 . X. L. Dong, B. Saha, and D. Srivastava. Less is more: Selecting sources wisely for integration. PVLDB, 6(2), 2013."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/1718487.1718504"},{"key":"e_1_2_1_11_1","first-page":"668","article-title":"Authoritative sources in a hyperlinked environment","author":"Kleinberg J. M.","year":"1998","unstructured":"J. M. Kleinberg . Authoritative sources in a hyperlinked environment . In SODA , 668 - 677 , 1998 . J. M. Kleinberg. Authoritative sources in a hyperlinked environment. In SODA, 668-677, 1998.","journal-title":"SODA"},{"key":"e_1_2_1_12_1","unstructured":"X. Li X. L. Dong K. B. Lyons W. Meng and D. Srivastava. Truth Finding on the Deep Web: Is the Problem Solved? http:\/\/lunadong.com\/publication\/webfusion_report.pdf.   X. Li X. L. Dong K. B. Lyons W. Meng and D. Srivastava. Truth Finding on the Deep Web: Is the Problem Solved? http:\/\/lunadong.com\/publication\/webfusion_report.pdf."},{"key":"e_1_2_1_13_1","first-page":"877","article-title":"Knowing what to believe (when you already know something)","author":"Pasternack J.","year":"2010","unstructured":"J. Pasternack and D. Roth . Knowing what to believe (when you already know something) . In COLING , 877 - 885 , 2010 . J. Pasternack and D. Roth. Knowing what to believe (when you already know something). In COLING, 877-885, 2010.","journal-title":"COLING"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.5591\/978-1-57735-516-8\/IJCAI11-387"},{"issue":"2","key":"e_1_2_1_15_1","first-page":"1662","article-title":"Information theory for data management","volume":"2","author":"Srivastava D.","year":"2009","unstructured":"D. Srivastava and S. Venkatasubramanian . Information theory for data management . PVLDB , 2 ( 2 ): 1662 - 1663 , 2009 . D. Srivastava and S. Venkatasubramanian. Information theory for data management. PVLDB, 2(2):1662-1663, 2009.","journal-title":"PVLDB"},{"key":"e_1_2_1_16_1","volume-title":"Proc. of the WebDB Workshop","author":"Wu M.","year":"2007","unstructured":"M. Wu and A. Marian . Corroborating answers from multiple web sources . In Proc. of the WebDB Workshop , 2007 . M. Wu and A. Marian. Corroborating answers from multiple web sources. In Proc. of the WebDB Workshop, 2007."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.is.2010.08.008"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2007.190745"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/1963405.1963439"},{"issue":"6","key":"e_1_2_1_20_1","first-page":"550","article-title":"A bayesian approach to discovering truth from conflicting sources for data integration","volume":"5","author":"Zhao B.","year":"2012","unstructured":"B. Zhao , B. I. P. Rubinstein , J. Gemmell , and J. Han . A bayesian approach to discovering truth from conflicting sources for data integration . PVLDB , 5 ( 6 ): 550 - 561 , 2012 . B. Zhao, B. I. P. Rubinstein, J. Gemmell, and J. Han. A bayesian approach to discovering truth from conflicting sources for data integration. PVLDB, 5(6):550-561, 2012.","journal-title":"PVLDB"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/2535568.2448943","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T11:06:01Z","timestamp":1672225561000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/2535568.2448943"}},"subtitle":["is the problem solved?"],"short-title":[],"issued":{"date-parts":[[2012,12]]},"references-count":20,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2012,12]]}},"alternative-id":["10.14778\/2535568.2448943"],"URL":"https:\/\/doi.org\/10.14778\/2535568.2448943","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2012,12]]}}}