{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,12]],"date-time":"2026-03-12T12:20:48Z","timestamp":1773318048981,"version":"3.50.1"},"reference-count":38,"publisher":"Association for Computing Machinery (ACM)","issue":"4","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2014,12]]},"abstract":"<jats:p>In many real world applications, the same item may be described by multiple sources. As a consequence, conflicts among these sources are inevitable, which leads to an important task: how to identify which piece of information is trustworthy, i.e., the truth discovery task. Intuitively, if the piece of information is from a reliable source, then it is more trustworthy, and the source that provides trustworthy information is more reliable. Based on this principle, truth discovery approaches have been proposed to infer source reliability degrees and the most trustworthy information (i.e., the truth) simultaneously. However, existing approaches overlook the ubiquitous long-tail phenomenon in the tasks, i.e., most sources only provide a few claims and only a few sources make plenty of claims, which causes the source reliability estimation for small sources to be unreasonable. To tackle this challenge, we propose a confidence-aware truth discovery (CATD) method to automatically detect truths from conflicting data with long-tail phenomenon. The proposed method not only estimates source reliability, but also considers the confidence interval of the estimation, so that it can effectively reflect real source reliability for sources with various levels of participation. Experiments on four real world tasks as well as simulated multi-source long-tail datasets demonstrate that the proposed method outperforms existing state-of-the-art truth discovery approaches by successful discounting the effect of small sources.<\/jats:p>","DOI":"10.14778\/2735496.2735505","type":"journal-article","created":{"date-parts":[[2015,5,12]],"date-time":"2015-05-12T15:37:52Z","timestamp":1431445072000},"page":"425-436","source":"Crossref","is-referenced-by-count":273,"title":["A confidence-aware approach for truth discovery on long-tail data"],"prefix":"10.14778","volume":"8","author":[{"given":"Qi","family":"Li","sequence":"first","affiliation":[{"name":"SUNY Buffalo, Buffalo, NY"}]},{"given":"Yaliang","family":"Li","sequence":"additional","affiliation":[{"name":"SUNY Buffalo, Buffalo, NY"}]},{"given":"Jing","family":"Gao","sequence":"additional","affiliation":[{"name":"SUNY Buffalo, Buffalo, NY"}]},{"given":"Lu","family":"Su","sequence":"additional","affiliation":[{"name":"SUNY Buffalo, Buffalo, NY"}]},{"given":"Bo","family":"Zhao","sequence":"additional","affiliation":[{"name":"Microsoft Research, Mountain View, CA"}]},{"given":"Murat","family":"Demirbas","sequence":"additional","affiliation":[{"name":"SUNY Buffalo, Buffalo, NY"}]},{"given":"Wei","family":"Fan","sequence":"additional","affiliation":[{"name":"Huawei Noah's Ark Lab, Hong Kong"}]},{"given":"Jiawei","family":"Han","sequence":"additional","affiliation":[{"name":"University of Illinois, Urbana, IL"}]}],"member":"320","published-online":{"date-parts":[[2014,12]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/2424321.2424335"},{"key":"e_1_2_1_2_1","first-page":"2946","volume-title":"IAAI","author":"Aydin B. I.","year":"2014","unstructured":"B. I. Aydin , Y. S. Yilmaz , Y. Li , Q. Li , J. Gao , and M. Demirbas . Crowdsourcing for multiple-choice question answering . In IAAI , pages 2946 -- 2953 , 2014 . B. I. Aydin, Y. S. Yilmaz, Y. Li, Q. Li, J. Gao, and M. Demirbas. Crowdsourcing for multiple-choice question answering. In IAAI, pages 2946--2953, 2014."},{"key":"e_1_2_1_3_1","first-page":"255","volume-title":"Proc. of ICML","author":"Bachrach Y.","year":"2012","unstructured":"Y. Bachrach , T. Minka , J. Guiver , and T. Graepel . How to grade a test without knowing the answers -- a bayesian graphical model for adaptive crowdsourcing and aptitude testing . In Proc. of ICML , pages 255 -- 262 , 2012 . Y. Bachrach, T. Minka, J. Guiver, and T. Graepel. How to grade a test without knowing the answers -- a bayesian graphical model for adaptive crowdsourcing and aptitude testing. In Proc. of ICML, pages 255--262, 2012."},{"key":"e_1_2_1_4_1","volume-title":"Proc. of WWW","author":"Bleiholder J.","year":"2006","unstructured":"J. Bleiholder and F. Naumann . Conflict handling strategies in an integrated information system . In Proc. of WWW , 2006 . J. Bleiholder and F. Naumann. Conflict handling strategies in an integrated information system. In Proc. of WWW, 2006."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/1456650.1456651"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.5555\/993483"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1137\/070710111"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687627.1687690"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-36257-6_13"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687553.1687620"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.14778\/2535568.2448938"},{"key":"e_1_2_1_12_1","volume-title":"Data quality: Theory and practice. Web-Age Information Management, page 1--16","author":"Fan W.","year":"2012","unstructured":"W. Fan . Data quality: Theory and practice. Web-Age Information Management, page 1--16 , 2012 . W. Fan. Data quality: Theory and practice. Web-Age Information Management, page 1--16, 2012."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-41660-6_12"},{"key":"e_1_2_1_14_1","volume-title":"A practical guide to heavy tails: statistical techniques and applications","author":"Feldman R.","year":"1998","unstructured":"R. Feldman and M. Taqqu . A practical guide to heavy tails: statistical techniques and applications . Springer , 1998 . R. Feldman and M. Taqqu. A practical guide to heavy tails: statistical techniques and applications. Springer, 1998."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-05813-9_30"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/1718487.1718504"},{"key":"e_1_2_1_17_1","volume-title":"Pearson Education","author":"Hogg R. V.","year":"2005","unstructured":"R. V. Hogg , J. McKean , and A. T. Craig . Introduction to mathematical statistics . Pearson Education , 2005 . R. V. Hogg, J. McKean, and A. T. Craig. Introduction to mathematical statistics. Pearson Education, 2005."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/2588555.2610509"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.14778\/2535568.2448943"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.14778\/3402707.3402731"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/PASSAT\/SocialCom.2011.188"},{"issue":"2","key":"e_1_2_1_22_1","first-page":"21","article-title":"Data fusion in three steps: Resolving schema, tuple, and value inconsistencies","volume":"29","author":"Naumann F.","year":"2006","unstructured":"F. Naumann , A. Bilke , J. Bleiholder , and M. Weis . Data fusion in three steps: Resolving schema, tuple, and value inconsistencies . IEEE Data Engineering Bulletin , 29 ( 2 ): 21 -- 31 , 2006 . F. Naumann, A. Bilke, J. Bleiholder, and M. Weis. Data fusion in three steps: Resolving schema, tuple, and value inconsistencies. IEEE Data Engineering Bulletin, 29(2): 21--31, 2006.","journal-title":"IEEE Data Engineering Bulletin"},{"key":"e_1_2_1_23_1","first-page":"877","volume-title":"Proc. of COLING","author":"Pasternack J.","year":"2010","unstructured":"J. Pasternack and D. Roth . Knowing what to believe (when you already know something) . In Proc. of COLING , pages 877 -- 885 , 2010 . J. Pasternack and D. Roth. Knowing what to believe (when you already know something). In Proc. of COLING, pages 877--885, 2010."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.5555\/2283696.2283785"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/2488388.2488476"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/2488388.2488479"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/2588555.2610504"},{"key":"e_1_2_1_28_1","first-page":"85","volume-title":"Proc. of NSDI","author":"Shen G.","year":"2013","unstructured":"G. Shen , Z. Chen , P. Zhang , T. Moscibroda , and Y. Zhang . Walkie-markie: indoor pathway mapping made easy . In Proc. of NSDI , pages 85 -- 98 , 2013 . G. Shen, Z. Chen, P. Zhang, T. Moscibroda, and Y. Zhang. Walkie-markie: indoor pathway mapping made easy. In Proc. of NSDI, pages 85--98, 2013."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/1401890.1401965"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/2339530.2339571"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/2020408.2020567"},{"key":"e_1_2_1_32_1","first-page":"2424","volume-title":"NIPS","volume":"10","author":"Welinder P.","year":"2010","unstructured":"P. Welinder , S. Branson , S. Belongie , and P. Perona . The multidimensional wisdom of crowds . In NIPS , volume 10 , pages 2424 -- 2432 , 2010 . P. Welinder, S. Branson, S. Belongie, and P. Perona. The multidimensional wisdom of crowds. In NIPS, volume 10, pages 2424--2432, 2010."},{"key":"e_1_2_1_33_1","first-page":"2035","volume-title":"NIPS","volume":"22","author":"Whitehill J.","year":"2009","unstructured":"J. Whitehill , P. Ruvolo , T. Wu , J. Bergsma , and J. R. Movellan . Whose vote should count more: Optimal integration of labels from labelers of unknown expertise . In NIPS , volume 22 , pages 2035 -- 2043 , 2009 . J. Whitehill, P. Ruvolo, T. Wu, J. Bergsma, and J. R. Movellan. Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In NIPS, volume 22, pages 2035--2043, 2009."},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2007.190745"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/1963405.1963439"},{"key":"e_1_2_1_36_1","volume-title":"Proc. of QDB","author":"Zhao B.","year":"2012","unstructured":"B. Zhao and J. Han . A probabilistic model for estimating real-valued truth from conflicting sources . In Proc. of QDB , 2012 . B. Zhao and J. Han. A probabilistic model for estimating real-valued truth from conflicting sources. In Proc. of QDB, 2012."},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.14778\/2168651.2168656"},{"key":"e_1_2_1_38_1","first-page":"2204","volume-title":"NIPS","author":"Zhou D.","year":"2012","unstructured":"D. Zhou , J. C. Platt , S. Basu , and Y. Mao . Learning from the wisdom of crowds by minimax entropy . In NIPS , pages 2204 -- 2212 , 2012 . D. Zhou, J. C. Platt, S. Basu, and Y. Mao. Learning from the wisdom of crowds by minimax entropy. In NIPS, pages 2204--2212, 2012."}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/2735496.2735505","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T09:32:29Z","timestamp":1672219949000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/2735496.2735505"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2014,12]]},"references-count":38,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2014,12]]}},"alternative-id":["10.14778\/2735496.2735505"],"URL":"https:\/\/doi.org\/10.14778\/2735496.2735505","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2014,12]]}}}