{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,12]],"date-time":"2026-03-12T15:49:06Z","timestamp":1773330546736,"version":"3.50.1"},"reference-count":37,"publisher":"Association for Computing Machinery (ACM)","issue":"10","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2014,6]]},"abstract":"<jats:p>\n            The task of\n            <jats:italic>data fusion<\/jats:italic>\n            is to identify the true values of data items (\n            <jats:italic>e.g.<\/jats:italic>\n            , the true date of birth for\n            <jats:italic>Tom Cruise<\/jats:italic>\n            ) among multiple observed values drawn from different sources (\n            <jats:italic>e.g.<\/jats:italic>\n            , Web sites) of varying (and unknown) reliability. A recent survey [20] has provided a detailed comparison of various fusion methods on Deep Web data. In this paper, we study the applicability and limitations of different fusion techniques on a more challenging problem:\n            <jats:italic>knowledge fusion<\/jats:italic>\n            . Knowledge fusion identifies true subject-predicate-object triples extracted by multiple information extractors from multiple information sources. These extractors perform the tasks of entity linkage and schema alignment, thus introducing an additional source of noise that is quite different from that traditionally considered in the data fusion literature, which only focuses on factual errors in the original sources. We adapt state-of-the-art data fusion techniques and apply them to a knowledge base with 1.6B unique knowledge triples extracted by 12 extractors from over 1B Web pages, which is three orders of magnitude larger than the data sets used in previous data fusion papers. We show great promise of the data fusion approaches in solving the knowledge fusion problem, and suggest interesting research directions through a detailed error analysis of the methods.\n          <\/jats:p>","DOI":"10.14778\/2732951.2732962","type":"journal-article","created":{"date-parts":[[2015,5,12]],"date-time":"2015-05-12T15:37:52Z","timestamp":1431445072000},"page":"881-892","source":"Crossref","is-referenced-by-count":148,"title":["From data fusion to knowledge fusion"],"prefix":"10.14778","volume":"7","author":[{"given":"Xin Luna","family":"Dong","sequence":"first","affiliation":[{"name":"Google Inc."}]},{"given":"Evgeniy","family":"Gabrilovich","sequence":"additional","affiliation":[{"name":"Google Inc."}]},{"given":"Geremy","family":"Heitz","sequence":"additional","affiliation":[{"name":"Google Inc."}]},{"given":"Wilko","family":"Horn","sequence":"additional","affiliation":[{"name":"Google Inc."}]},{"given":"Kevin","family":"Murphy","sequence":"additional","affiliation":[{"name":"Google Inc."}]},{"given":"Shaohua","family":"Sun","sequence":"additional","affiliation":[{"name":"Google Inc."}]},{"given":"Wei","family":"Zhang","sequence":"additional","affiliation":[{"name":"Google Inc."}]}],"member":"320","published-online":{"date-parts":[[2014,6]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-16518-4"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.5555\/1883784.1883795"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/1456650.1456651"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/1376616.1376746"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0169-7552(98)00110-X"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.14778\/1453856.1453916"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.5555\/2898607.2898816"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.14778\/2180912.2180920"},{"key":"e_1_2_1_9_1","first-page":"137","volume-title":"OSDI","author":"Dean J.","year":"2004","unstructured":"J. Dean and S. Ghemawat . MapReduce: Simplified data processing on large clusters . In OSDI , pages 137 -- 149 , 2004 . J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. In OSDI, pages 137--149, 2004."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.14778\/1920841.1921008"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687627.1687690"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687627.1687691"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687553.1687620"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.14778\/2535568.2448938"},{"key":"e_1_2_1_15_1","volume-title":"Statistical methods for rates and proportions","author":"Fleiss J.","year":"1981","unstructured":"J. Fleiss . Statistical methods for rates and proportions . John Wiley and Sons , 1981 . J. Fleiss. Statistical methods for rates and proportions. John Wiley and Sons, 1981."},{"key":"e_1_2_1_16_1","first-page":"413","volume-title":"WWW","author":"Gal\u00e1rraga L. A.","year":"2013","unstructured":"L. A. Gal\u00e1rraga , C. Teflioudi , K. Hose , and F. Suchanek . Amie: association rule mining under incomplete evidence in ontological knowledge bases . In WWW , pages 413 -- 422 , 2013 . L. A. Gal\u00e1rraga, C. Teflioudi, K. Hose, and F. Suchanek. Amie: association rule mining under incomplete evidence in ontological knowledge bases. In WWW, pages 413--422, 2013."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/1718487.1718504"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.14778\/2732286.2732288"},{"key":"e_1_2_1_19_1","volume-title":"SODA","author":"Kleinberg J. M.","year":"1998","unstructured":"J. M. Kleinberg . Authoritative sources in a hyperlinked environment . In SODA , 1998 . J. M. Kleinberg. Authoritative sources in a hyperlinked environment. In SODA, 1998."},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.14778\/2535568.2448943"},{"key":"e_1_2_1_21_1","volume-title":"chin Ooi, and D. Srivastava. Online data fusion. PVLDB, 4(12)","author":"Liu X.","year":"2011","unstructured":"X. Liu , X. L. Dong , B. chin Ooi, and D. Srivastava. Online data fusion. PVLDB, 4(12) , 2011 . X. Liu, X. L. Dong, B. chin Ooi, and D. Srivastava. Online data fusion. PVLDB, 4(12), 2011."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.14778\/1454159.1454163"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.5555\/1690219.1690287"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.4018\/jswis.2012070103"},{"key":"e_1_2_1_25_1","first-page":"877","volume-title":"COLING","author":"Pasternack J.","year":"2010","unstructured":"J. Pasternack and D. Roth . Knowing what to believe (when you already know something) . In COLING , pages 877 -- 885 , 2010 . J. Pasternack and D. Roth. Knowing what to believe (when you already know something). In COLING, pages 877--885, 2010."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.5555\/2283696.2283785"},{"key":"e_1_2_1_27_1","volume-title":"WWW","author":"Pasternack J.","year":"2013","unstructured":"J. Pasternack and D. Roth . Latent credibility analysis . In WWW , 2013 . J. Pasternack and D. Roth. Latent credibility analysis. In WWW, 2013."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/2588555.2593674"},{"key":"e_1_2_1_29_1","volume-title":"WWW","author":"Qi G.-J.","year":"2013","unstructured":"G.-J. Qi , C. Aggarwal , J. Han , and T. Huang . Mining collective intelligence in groups . In WWW , 2013 . G.-J. Qi, C. Aggarwal, J. Han, and T. Huang. Mining collective intelligence in groups. In WWW, 2013."},{"key":"e_1_2_1_30_1","volume-title":"NAACL","author":"Ratinov L.","year":"2011","unstructured":"L. Ratinov , D. Roth , D. Downey , and M. Anderson . Local and global algorithms for disambiguation to wikipedia . In NAACL , 2011 . L. Ratinov, D. Roth, D. Downey, and M. Anderson. Local and global algorithms for disambiguation to wikipedia. In NAACL, 2011."},{"key":"e_1_2_1_31_1","first-page":"1","article-title":"Modeling missing data in distant supervision for information extraction","author":"Ritter A.","year":"2013","unstructured":"A. Ritter , L. Zettlemoyer , Mausam, and O. Etzioni . Modeling missing data in distant supervision for information extraction . Trans. Assoc. Comp. Linguistics , 1 , 2013 . A. Ritter, L. Zettlemoyer, Mausam, and O. Etzioni. Modeling missing data in distant supervision for information extraction. Trans. Assoc. Comp. Linguistics, 1, 2013.","journal-title":"Trans. Assoc. Comp. Linguistics"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/1242572.1242667"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/2509558.2509561"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/1281192.1281309"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/1963405.1963439"},{"key":"e_1_2_1_36_1","volume-title":"QDB","author":"Zhao B.","year":"2012","unstructured":"B. Zhao and J. Han . A probabilistic model for estimating real-valued truth from conflicting sources . In QDB , 2012 . B. Zhao and J. Han. A probabilistic model for estimating real-valued truth from conflicting sources. In QDB, 2012."},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.14778\/2168651.2168656"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/2732951.2732962","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T10:58:35Z","timestamp":1672225115000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/2732951.2732962"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2014,6]]},"references-count":37,"journal-issue":{"issue":"10","published-print":{"date-parts":[[2014,6]]}},"alternative-id":["10.14778\/2732951.2732962"],"URL":"https:\/\/doi.org\/10.14778\/2732951.2732962","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2014,6]]}}}